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PREFACE 


Nonstationarity  is  another  name  for  intermittency,  a  phenomenon 
which  affects  many  physical  processes  in  the  atmospheric  boundary 
layer.  Among  these  are  the  transfers  of  heat,  momentum,  and  moisture, 
and  thus  the  propagation  of  electromagnetic  waves.  Nonstationarity  also 
appears  in  the  nonlinear  propagation  of  acoustic  waves  and  in  noise 
radiation  by  supersonic  jets  and  helicopters.  Such  signals  are 
commonplace  in  military  communication  systems,  ship  and  submarine 
stealth,  and  maneuvering  and  control  problems.  Also,  data  collected  in 
many  diverse  R&D  programs  frequently  exhibit  nonstationary  features. 
Of  these  a  common  one  is  turbulence  data,  which  we  know  from 
observations  to  be  a  complicated  nonlinear  dynamical  phenomenon  of 
limited  predictability. 

Several  advances  in  the  theory  of  nonstationary  random  processes 
have  been  made  since  the  first  workshop  convened  in  1991.  And 
although  much  research  remains  yet  to  be  done,  it  seems  opportune  to 
convert  some  of  the  existing  work  into  a  more  permanent  form.  It  is 
debatable  what  impact  the  new  knowledge  has  had  on  the  development  of 
practical  models  of  this  important  non-equilibrium  phenomenon.  The 
chief  aim  of  the  workshop  was  therefore  to  provide  a  forum  at  which  the 
recent  important  contributions  could  be  reported  and  discussed  among 
researchers  from  government,  academia,  and  industry. 

The  theme  of  the  workshop  was  all  aspects  of  nonstationary 
analysis,  with  the  appeal  for  participation  being  made  to  engineers, 
scientists,  and  mathematicians  alike.  This  appeal  is  consistent  with 
that  of  the  first  workshop.  The  intent  was  to  create  a  diverse 
environment  for  researchers  working  in  this  genuinely  multidisciplinary 
field  to  mutually  share  their  ideas.  The  premier  objective  of  the 
workshop  was  to  consolidate  recent  developments  in  nonstationary 
analysis  and  present  the  material  in  a  "tutorial  mode."  A  second 
objective  was  to  delineate  open  problems. 

We  would  like  to  express  our  gratitude  to  those  who  assisted  us  in 
convening  the  workshop.  Sincere  thanks  go  to  Jack  Preisser  (NASA 
Langley),  Frank  Halsall  (NSWC-Carderock),  and  Walter  Bach  (Army 
Research  Office),  who  collectively  provided  most  of  the  financial  support 
necessary  to  bring  the  workshop  to  fruition.  We  gratefully  acknowledge 
the  administrative  support  provided  by  Norma  Trevirio.  To  the 
anonymous  referees  who  reviewed  the  submitted  manuscripts,  and  to 
those  who  served  as  session  chairmen  (Frank  Halsall,  Alan  Piersol  and 
Ken  Bolland),  we  are  also  grateful.  Most  of  all,  though,  we  express  our 
deep  appreciation  to  the  authors,  whose  hard  work  and  dedication  made 
the  workshop  a  success. 

George  Trevino 
Jay  Hardin 
Bruce  Douglas 
Edgar  Andreas 
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ABSTRACT 

This  paper  deals  with  a  large  class  of  nonstationary  stochastic  processes 
generated  by  passing  white  noise  through  a  general  linear  time-varying 
system.  It  is  shown  that  such  processes  can  be  characterized  in  terms 
of  a  family  of  wide-sense  stationary  processes.  This  formulation  is  used 
to  define  the  power  spectrum  for  the  nonstationary  processes  considered 
in  the  paper.  Then  the  power  spectrum  is  used  to  give  a  suboptimal 
solution  to  the  matched  filtering  problem  for  the  case  of  nonstationary 
interference.  This  suboptimal  solution  is  shown  to  be  nearly  optimal 
under  conditions  corresponding  to  a  sufficiently  small  rate  of  variation. 
A  numerical  example  compares  this  suboptimal  solution  to  an  optimal 
solution. 


1.  Introduction 

Matched  filters  are  used  extensively  in  radar  and  communication  systems  to  detect 
the  presence  of  a  signal  buried  in  noise.  For  example  in  a  radar  application,  a  sinusoidal 
pulse  is  transmitted,  reflected  back  from  a  target,  and  received.  A  matched  filter  is  used 
to  detect  the  arrival  of  the  received  signal,  and  the  arrival  time  can  be  used  to  determine 
the  target’s  disteince.  Early  work  on  matched  filters,  dating  back  to  the  1940’s,  treated 
the  case  of  stationary  interference  and  ideal  (time-invariant)  propagation  channels'. 
Matched  filtering  in  the  case  of  nonideal  (time-varying)  channels  (RAKE  filtering)  has 
received  considerable  attention  since  it  was  first  treated  by  Price  and  Green^  in  1958. 

In  recent  years  applications  for  detecting  signals  buried  in  nonstationary  noise  have 
been  identified.  For  example,  radar  and  communication  systems  affected  by  counter 
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measures  must  combat  nonstationary  interference.  Recently  Modestino  and  Melendez^ 
presented  a  matched  filter  which  treats  nonstationary  interference  using  techniques  based 
a  local  covariance  matrix.  In  this  paper  we  construct  a  matched  filter  for  the  case  of 
nonstationary  interference  using  frequency-domain  techniques. 

We  consider  interference  signals  from  the  general  class  of  nonstationary  stochastic 
processes  generated  by  passing  white  noise  through  a  causal  linear  system  as  shown  in 
Figure  1.  More  precisely,  we  consider  the  nonstationary  process  v(n)  given  by  the 
superposition  summation 


L  (1) 

where  e(m)  is  zero-mean  unit- variance  white  noise  and  g(n,m)  is  the  linear  system’s 
impulse-response  function  defined  as  the  system  response  at  index  n  to  an  impulse  applied 
at  index  m. 


e(n) 


g(n,m) 


^  v(n) 


Figure  1.  Block  diagram  of  signal  model. 


We  restrict  our  attention  to  processes  with  finite-valued  autocorrelation  functions 
r^(i+n,i)=E[v(i+n)v(i)]  by  requiring  that  the  impulse-response  function  g(n,m)  be  square 
summable  over  m  for  each  value  of  that  is,  there  exists  a  function  c(n),  with  c(n)<^, 
such  that 


E  <  c(n)  (2) 

We  now  formally  define  the  class  of  processes  considered  in  this  paper. 

Definition  1.  A  random  process  v(n)  generated  by  (1),  where  the  impulse-response 
function  g(n,m)  satisfies  (2),  is  said  to  be  purely  nondeterministic  (PND). 

Cramer'’  extended  the  results  of  Wold  by  showing  that  any  process  with  finite¬ 
valued  mean  and  autocorrelation  functions  could  be  decomposed  into  its  deterministic  (as 
defined  in  [Cramer  1961])  and  PND  parts.  Therefore,  we  are  considering  all  such 
processes  for  which  the  deterministic  component  is  zero. 

The  problem  of  detecting  signals  buried  in  nonstationary  PND  noise  is  described 
in  Section  2.  In  Section  3  we  show  that  associated  with  any  PND  nonstationary  process 
is  an  associated  family  of  jointly  wide-sense  stationary  processes.  This  formulation  leads 
to  the  notion  of  power  spectrum  for  a  class  of  PND  processes,  and  it  is  also  used  to 
formulate  a  "clairvoyant"  detection  problem  in  Section  4.  The  optimal  solution  to  the 
clairvoyant  detection  problem  is  given  in  terms  of  the  power  spectrum.  In  Section  5  the 
"clairvoyant"  solution  is  applied  to  the  detection  problem  of  Section  2.  This  suboptimal 
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solution  is  shown  to  be  nearly  optimal  under  slowly  varying  conditions.  A  computer 
simulation  presented  in  Section  6  compares  the  suboptimal  solution  with  the  optimal 
clairvoyant  solution. 


2.  The  Detection  Problem 

In  the  detection  problem,  a  measured  process  r(n,nj)=s(n-nj)+v(n)  is  composed 
of  a  known  signal  s(n)  and  a  corrupting  interference  v(n)  as  shown  in  Figure  2.  The 
signal  component  s(n)  is  delayed  by  the  unknown  quantity  and  the  interference  v(n) 
is  a  nonstationary  PND  process  with  signal  model  given  by  the  impulse-response  function 
g(n,m).  The  objective  is  to  detect  the  signal  s(n)  and  determine  the  most  likely  value  for 
the  delay  rij  from  the  measured  process  r(n,nj). 

The  received  signal  r(n,nj)  is  passed  through  a  linear  filter  with  impulse-response 
function  h(n,m)  to  generate  the  TQsponsQ  y(n,n^)  given  by 

n 

y(n,n^)  =  Y,  h(n,m)  rim,n^)  =y,(n,n^)  +y,(n) 

m=-oo 

where  the  noise  component  y/n)  is  given  by 

n 

y,(«)  =  Y 

m=-« 

and  the  signal  component  is  given  by 

n 


S(n) 

s(n-nj 


y(n.nj 


Figure  2.  Block  diagram  of  detection  problem. 


As  in  the  stationary  case,  the  detection  filter  is  to  compress  the  signal  energy  in 
time  thereby  generating  a  response  that  reaches  a  maximum  amplitude  at  a  particular  time 
index  rip.  The  time  index  at  which  the  maximum  occurs  is  then  used  to  find  the  delay  n^. 
Solving  this  problem  requires  finding  the  filter  hf^(n,m)  such  that  with  h(n,m)=hi^(n,m) 
the  signal-to-noise  ratio  defined  by 
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Rin,n,)  - 


ry{n,  n) 


is  maximized  at  the  specific  index  n=np=nj+nQ  where  Hq  is  a  design  parameter  and 

(rt,  n)  =  E[y^(n)y^(n)] 


The  delay  is  then  calculated  as  nj^np-ng. 

It  is  well  known  that  if  the  impulse-response  function  g(n,m)  of  the  signal  model 
for  v(n)  is  time  invariant  so  that  v(n)  is  stationary,  then  the  detection  filter  which 
maximizes  the  signal-to-noise  ratio  is  given  by 


where  is  the  frequency-response  function  of  the  filter  h^(n,m),  S(e^‘^)  is  the 

Fourier  transform  of  the  known  signal  s(n),  is  the  power  spectrum  of  the  stationary 

noise  v(n),  and  Uq  is  a  design  parameter  chosen  to  insure  a  causal  solution.  Given  the 
relationship  between  H^(e-'‘^)  and  S(e-^‘^)  for  the  stationary  case,  the  optimal  detection 
filter  is  commonly  called  a  matched  filter. 

Unfortunately,  in  many  cases  finding  the  optimal  solution  is  extremely  difficult. 
In  this  paper  we  seek  a  "viable"  suboptimal  solution.  The  suboptimal  solution  is  based 
on  the  power  spectrum  defined  the  next  section. 


3.  Power  Spectra  for  PND  Processes 

In  this  section  we  define  the  power  spectrum  for  a  class  of  PND  nonstationary 
processes.  We  begin  by  showing  that  every  nonstationary  PND  process  has  associated 
with  it  a  family  of  jointly  wide-sense  stationary  processes. 

Consider  a  PND  nonstationary  process  v(n)  with  signal  model  given  by  the 
impulse-response  function  g(n,  m) .  Introduce  the  integer  k  by  defining  g^(n)  =g(k,  k~n) .  For 
a  fixed  integer  k,  let  g,<(n)  be  the  impulse-response  function  of  a  linear  time-invariant 
(LTI)  system.  The  collection  of  impulse-response  functions  {gk(n)}  forms  a  family  of  LTI 
systems  indexed  by  k.  A  family  of  processes  {v,^(n)}  associated  with  the  nonstationary 
process  v(n)  is  defined  by  Sills^  in  terms  of  the  {gi,(n))  by 

n 

i  =-00 

We  can  make  three  immediate  observations  regarding  the  associated  family  of 
processes.  First,  the  nonstationary  process  v(n)  can  be  calculated  from  the  by 

evaluating  the  {v,,(n)}  at  k=n\  that  is 

v(n)  =  Vjt(n) 

Second,  the  family  of  processes  {v^(n))  are  jointly  wide-sense  stationary  (WSS),  and 
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hence,  the  family  of  crosscorrelation  functions  {r  }  is  defined  as 

=  E{v^{i  +n)v^{i)]  (3) 

And  finally  third,  the  autocorrelation  of  the  nonstationary  process  v(n)  is  related  to  the 
family  of  crosscorrelations  {r  }  by 

r,(n,m)  =r^^^{n-rn) 

The  fmiteness  oi  r  m,i)  is  insured  by  (2). 

For  the  remainder  of  the  paper  we  will  consider  PND  processes  for  which  the 
impulse-response  function  g(n,m)  is  absolutely  summable  in  m  for  each  n;  that  is,  we  now 
assume  that  there  exists  a  function  c(n),  with  c(n)<^  for  all  n,  such  that 

Y,  \g(n,  m)  I  <  c(n)  (4) 

m=-«i 

Consequently,  we  are  now  considering  a  subclass  of  PND  processes  since  (4)  is  more 
constraining  than  (2).  The  additional  constraint  given  in  (4)  insures  the  existence  of  the 
power  spectrum  defined  below. 

For  any  fixed  integers  k  and  m,  the  joint  power  spectrum  between  v^(n)  and  v„/n) 
is  defined  by 

(5) 

(  =-00 

where  the  crosscorrelation  r  v*  v  J  given  by  (3).  The  convergence  of  the  sum  in  (5) 
is  guaranteed  by  (4).  We  can  now  state  the  following  result  which  is  used  in  Section  5. 
Proposition  2.  The  joint  power  spectrum  ^  J  e  ■'  “ )  defined  by  (5)  can  be  calculated 
using  the  relationship 

for  any  integers  k  and  m  where  G(e^'^,k)  is  defined  by 

G(  e  J  “  A:)  =  A:,  k-i )  e  (6) 

i  -0 

The  quantity  G(e^‘",k)  defined  by  (6)  is  the  frequency-response  function  of  the  signal 
model  and  was  first  studied  by  Zadeh^  in  1950.  Like  the  impulse-response  function,  it 
completely  characterizes  the  signal  model. 

It  follows  from  (5)  that  the  autocorrelation  function  r^(n,m)  of  the  nonstationary 
process  v(n)  can  be  expressed  as 

TT 

r,(n,m)  =  g)  (7) 

-*77 

Equation  (7)  provides  some  justification  for  the  following  definition  of  the  power 
spectrum  for  the  nonstationary  processes  considered  in  this  paper. 

Definition  3.  Given  a  PND  nonstationary  process  v(n)  with  signal  model  g(n,m) 


6 


satisfying  (4),  the  power  spectrum  S^(e^'^,n)  of  the  nonstationary  process  v(n)  is  defined 
by 

(8) 

where  is  defined  by  (5). 

The  power  spectrum  defined  here  is  equal  in  value  to  the  power  spectrum  defined 
by  Tj</>stheim^,  but  Tj^stheim’s  definition  has  a  different  interpretation.  Tj^stheim’s 
power  spectrum  is  defined  to  satisfy  certain  operator  theoretic  properties.  Several  other 
attempts  have  been  made  to  define  an  appropriate  power  spectrum  for  nonstationary 
processes.  These  include  an  instantaneous  power  spectrum  by  Page^  the  Wigner-Ville 
distribution^,  and  the  evolutionary  spectrum  by  Priestley'”.  For  a  summary  of  frequency- 
domain  techniques  for  nonstationary  processes  see  Trevino". 

In  1968  Loynes'^  listed  the  properties  which  a  power  spectrum  for  nonstationary 
processes  should  satisfy.  Although  all  of  the  definitions  mentioned  partially  satisfy  this 
list,  a  power  spectrum  which  satisfies  all  of  these  properties  has  not  yet  been  discovered. 
Some  of  the  properties  proposed  by  Loynes  which  are  satisfied  by  the  power  spectrum 
S/e^'^,n)  defined  by  (8)  are  listed  next. 

Proposition  4.  The  power  spectrum  S^(e-’‘^,n)  defined  by  (8)  is  real  valued  and 
nonnegative  with  S/e^‘^,n)=S/e-''',n)  for  all  integers  n  and  coG  1  (where  M  is  the  set  of  real 
numbers). 

Proposition  5.  The  instantaneous  power  in  the  process  v(n)  at  index  n  is  equal  to  the 
instantaneous  power  in  the  spectrum  S/e^‘^,n)  at  index  n;  that  is, 

TT 

-TT 

As  Loynes  pointed  out,  in  generalizing  any  theory  to  cover  a  more  general 
situation,  an  obvious  requirement  is  that  the  new  theory  should  be  consistent  with  the 
earlier  formulation.  As  shown  below,  this  is  the  case  for  the  power  spectrum  S/e^'^.n) 
defined  by  (8). 

Proposition  6.  If  the  signal  model  for  the  process  v(n)  is  LTI  then  the  process  v(n)  is 
WSS  and  the  generalized  power  spectrum  S^(e^‘^,n)  reduces  to  the  ordinary  power 
spectrum 

Another  of  the  requirements  set  forth  by  Loynes  is  that  the  power  spectrum  should 
transform  simply  when  the  process  is  transformed  linearly.  Consider  the  process  y(n) 
formed  by  passing  the  process  v(n),  with  signal  model  defined  by  the  frequency-response 
function  G(e^“,n),  through  the  linear  filter  with  frequency-response  function  H(e^‘^,n)  as 
shown  in  Figure  3. 

Proposition  7.  The  power  spectrum  of  y(n)  is  given  by 
Sy(e^^,n)  =  |r(e>n  «) 


where 


(9) 
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r(e>“,n)  =  Vhin,i)  ^  ^  g co(«-o 

/tr;,  G(e^“  n) 

and  is  the  impulse-response  function  of  the  filter  with  frequency-response  function 

Note  that  if  the  cascaded  filters  in  Figure  3  are  time  invariant  then  V(e^"',n)  = 
H(e^‘^,n)=H(e^‘^).  In  this  case  (9)  reduces  to  the  well-known  result  8/6^"^)  = 

\H(e^‘^)  \  ^  for  the  stationary  case. 

Although  the  relationship  given  by  equation  (9)  is  simple,  using  it  to  find  S/e^'^.n) 
requires  finding  T(e^‘^,n)  which  can  be  very  difficult.  Therefore,  we  consider  the 
simplifying  approximation  given  by  the  quantity  Sy{e>'^,n)  =\H{e^^,n)  p5^(  e’^,n)  .  We 
will  use  this  approximation  in  Section  5.  We  now  conclude  this  section  by  investigating 
the  accuracy  of  this  approximation  as  a  function  of  the  rate  of  variation  in  v(n). 
Consider  the  quantity  G(e^'^,e^'^)  defined  by 


/J=-GO 

Clearly  if  the  signal  model  G(e^'^,n)  is  time  invariant,  then  the  process  v(n)  is  stationary 
and  G(e^'^,e^'‘‘)=G(e^‘^,0)d(<j>)  for  -7r<(^><ir.  Kailath'^  observes  that  the  frequency  variable 
(j)  in  G(e^'^,e^^)  corresponds  to  the  rate  of  variation  in  the  signal  model.  It  is  reasonable 
to  conclude  that  as  the  bandwidth  of  G{e-’“,e^‘^)  in  0  increases  so  does  the  rate  of  variation 
in  the  signal  model. 

Assume  that  there  exists  finite  constant  values  Bq,  and  B^  such  that 


\G{ei^,n)  \  <  B^ 
\<B„ 


(10) 


^ly\\C(ei“el*)\d4><B^,  (11) 

for  all  integers  n  and  (Uj  G  M.  Since  it  would  appear  reasonable  to  regard  the  smallest 
values  of  B^  satisfying  (11)  as  corresponding  to  the  bandwidth  of  G(e^‘^,e^‘^)  in  (f>,  it  in 
turn  also  corresponds  to  the  rate  of  variation  for  the  processes  v(n). 

It  will  also  be  assumed  that  there  exists  a  finite  constant  value  B^  such  that 

Y,  \  i  \  \  Hn,  n-i )  \  <Bf,  (12) 

i  =0 

for  all  integers  n.  Roughly  speaking,  the  smallest  value  5^  for  which  (12)  holds 
corresponds  to  the  maximum  duration  of  the  impulse-response  function  h(n,n-i).  Now  the 
following  result  can  be  stated. 

Proposition  8.  Let  Bq,  B^,  5^,  and  B^  be  given  by  (10),  (11),  and  (12)  respectively. 
Then  with  Bq,  B^,  and  B,,  fixed,  1 5/  e  >  “  n)  -Sy  ( e  ^  “  n)  |  -»  0  uniformly  in  co  and  «  as 
1^0  I --0. 
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Proof.  First  define  '^Q(e^‘^,n)  by 

n)  =  n)  G(e^  “  «)  n)  G(e^",  n) 

Now  it  is  easily  verified  that 

TT 

rG(e>“  «)G(e>“  «)  =  (13) 

-rr 

Using  the  Mean-Value  Theorem  (13)  can  be  expressed  as 

TgC e “  n)  G( e >  «)  =  H(ej'^,n)G(e J  “  n)  +  X  e > ( ^  G ( e ;  <»,  e^‘^)e > 

where  0<7<L  Using  the  bounds  given  by  (1 1)  and  (12),  it  follows  that 

|’^(e>«,n)  I  <B,B^  (14) 

From  (14)  it  follows  easily  that 


s(n) 


G(e^^,n) 


v(n) 


H(ej^,n) 


-^y(n) 


Figure  3.  Block  diagram  of  G(e^“,n)  cascaded  with  H(ej“,n). 


The  power  spectrum  is  used  to  give  the  optimal  solution  to  the  clairvoyant 
detection  problem  described  in  the  next  section,  and  in  Section  5  it  is  used  to  give  a 
suboptimal  solution  to  the  detection  problem  described  in  Section  2. 

4.  Clairvoyant  Detection 

Consider  the  clairvoyant  detection  problem  depicted  in  Figure  4  where  we 
somehow  are  given  the  family  of  processes  {r^(n,nj=s(n-nj+v^j(n)}  associated  with  the 
measured  process  r(n,nj).  For  a  fixed  integer  k,  the  received  signal  r/n.rij)  is  passed 
through  the  filter  h^(n)  to  generate  the  response  yk(n,nj)  given  by 

n 

ykin,na)  =  Y.  hk(n-m)rk(m,n^)  =  y ,^( n,  nj  +y ,^( n,  n^)  (15) 

m=-® 

The  objective  in  this  problem  is  to  design  the  family  of  time-invariant  filters  [hj^p^in)  } 
such  that  with  hk(n)=hj^p^(n)  ,  the  family  of  signal-to-noise  ratios  Rk(n,nj)  defined  by 
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^d) 


ry^(.n,n) 


(16) 


are  maximized  at  the  specific  index  n-yip-nj^-riQ  where  yIq  is  a  design  parameter  and 
ry^{n,  n) 


s(n) 

s(n-nj 


Figure  4.  Block  diagram  of  clairvoyant  detection  problem. 


The  optimal  solution  to  this  clairvoyant  detection  problem  is  given  by  the  following 
theorem. 

Theorem  9.  With  h^(n)=hj^p^{n)  the  family  of  time-invariant  filters  which 

maximize  the  family  of  signal-to-noise  ratios  R^(n,nj)  at  the  index  is  given 

by 


SAeJ^,k) 


where  is  the  Fourier  transform  of  s(n)  and 


/I=-oo 

Proof.  Write  (16)  in  the  modified  form 


jiLrS ^ )  '/■?..( e’“k)  ^ (0 


Rf^(n,nj)  = 

-TT 

Employing  the  Scharz  inequality  gives 
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jR,(n,nJ  j - ’-!— 

2ir/|/4,,^(e>")  pS,(eJ'^.k)du, 

“TT 

with  equality  holding  at  the  index  n=np=nd+no  if 

^  S,(eJ^,k) 


5.  Suboptimal  Detection 

Recall  that  the  detection  problem  of  Figure  2  calls  for  finding  the  time-varying 
filter  hf^n,m)  such  that  with  h(k,k-n)=  h^^,k-n),  the  signal-to-noise  ratio  R(n,nj)  is 
maximized.  Consider  the  suboptimal  detector  defined  in  terms  of  the  optimal  clairvoyant 
detector  according  to  h(k,k-n)=hj^j^{  n)  with  response  y(n,nj)  given  by 

n 

Hn,na)  =  Y,  hj^p^in-m)  r(m,  n^)  (17) 

m=-w 

The  following  theorem  states  that  under  slowly  varying  conditions,  the  signal-to-noise 
ratio  R(n,nj)  resulting  from  h(k,k-n)=hj^fj^(n)  will  be  close  to  the  minimum  signal-to- 
noise  ratio  resulting  from  h(k,k-n)=hi^(k,k-n) .  The  statement  of  this  theorem  requires  the 
existence  of  fixed  finite  values  b^,  bg,  b^,  and  By^  such  that 

|r(e^“  rt)  I  >  bp  >  0  |G(e^“  n)  |  >  >  0  \H(ej^,n)  |  >  >  0(1^) 

for  all  integers  n  and  co  £  E  and 

\y,(n,n,)  \^  <  By^  (19) 

for  all  integers  n  and  all  positive  integers  n^. 

Theorem  10.  Consider  the  detection  problems  diagramed  in  Figure  2  and  Figure  4.  Let 
V  bo,  b^.  By  ,  Bo,  B„,  B^,  and  B,  be  given  by  (18),  (19),  (10),  (11),  and  (12) 
respectively,  then  with  b^,  bo,  V  and  h(k,k-n)=h/n), 

\R(n,  n^)  -R„in,  n^)  \  -^0  uniformly  in  w  and  n  as  \B^\-*0. 

Proof.  First  observe  that =///£•'“).  Next  write 


\R{n,nj)-R^(n,nj) 


y,(^.^^)  _ y.(^>”d) _ 

TT  77 

^fSy(eJ‘^,n)dGi  p5„(e>“  n)d(o 

-77  -77 


Then  cross  multiply  to  form  a  common  denominator 
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2Try,(n,  n^)  j\H(ej  “  n)  g  n)  -S^  ( e^^,n)dw 


\R{n,n^)-R„in,nj)  |  = 

|5^  (e>“  f  IH(ej‘^,n)  l^S,(e  ’ n)  don 

-TT  -TT 

It  follows  from  the  given  inequalities  and  Proposition  8  that 

jR(n,n^)-B„(n,nj)  |  <  - ,,  .2T''72 - 


(19) 


With  h(n,n-m)=h„(m)  Theorem  10  states  that  if  the  rate  of  variation  in  v(n)  is 
sufficiently  small,  then  the  functions  R(n,nj)  and  R„(n,nJ  are  close.  An  explicit  bound 
on  the  difference  \R(n,nj)-R„(n,nJ\  is  given  by  equation  (19).  The  point  is  that  when  the 
signal-to-noise  ratio  R(n,nj)  resulting  from  h^(m)=hj^p^{m)  is  close  to  the  maximum 
signal-to-noise  ratio  resulting  from  h(n,n-m)=hj^n,n-m),  then  the  detector  response 
defined  by  (17)  and  generated  using  h(m,k)=hf^pj^m)  is  a  "viable"  suboptimal  solution. 


6.  Numerical  Example 

In  this  section  we  present  a  numerical  example  which  compares  the  response  of 
the  suboptimal  detector  defined  by  (17)  with  that  of  the  optimal  clairvoyant  detector 
defined  by  (15). 

In  this  example  we  wish  to  detect  the  signal  s(n)-sin(n'K/4)  n=0,2,...,31  in  the 
received  signal  r(n,nj)=s(n-nJ+v(n).  The  signal  s(n-nj)  for  nj^J28  is  shown  in  Figure 
5.  The  nonstationary  interference  v(n)  has  signal  model  given  by 


G{ej^,n)  =  - ^ 

1 +fli(n)e-^'^  +  fl2(«)e 

where  the  numerator  b(n)  is  chosen  to  normalize  the  process  power  over  time,  and  the 
denominator  polynomial  has  roots  where  M=0.9  and 


,  n  <0 


(0(n) 


TT+Z  37r_  7r\  n 
4  [  4  4/256  ’ 


1  <n  <256 


3-77 

4 


,  n  >257 


A  sample  realization  of  the  received  signal  r(n,nj  is  shown  in  Figure  6.  Observe  that  the 
frequency  content  of  the  sample  realization  in  Figure  6  changes  such  that  the  initial 
frequency  is  lower  than  the  final  frequency. 
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Figure  7  shows  the  detector  response  if  one  treats  the  noise  as  stationary  and 
white.  The  optimal  clairvoyant  detector  response  y(n,n^)~y„(n,nj)  defined  by  (15)  is 
shown  in  Figure  8,  and  the  suboptimal  detector  response  defined  by  (17)  is  shown 

in  Figure  9.  Comparing  Figures  7,  8,  and  9  one  finds  that  treating  the  interference  as 
stationary  and  white  is  inadequate  for  detection  purposes,  but  one  also  finds  that  the 
suboptimal  detector  provides  a  response  that  is  both  adequate  and  nearly  identical  to  that 
of  the  optimal  clairvoyant  detector. 


Index  n 

Figure  5.  Received  signal. 


Index  n 


Figure  6.  Received  noisy  signal. 


Index  n 


Figure  7.  White  noise  detector  response. 


Index  n 

Figure  8.  Optimal  clairvoyant  response. 


Index  n 


Figure  9.  Suboptimal  response. 
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7.  Conclusions 

We  showed  that  associated  with  every  PND  process  is  a  family  of  jointly  wide- 
sense  stationary  processes.  This  fundamental  interpretation  has  some  interesting 
theoretical  implications  including  a  definition  for  the  power  spectrum  of  a  nonstationary 
process.  This  interpretation  was  also  used  to  formulate  a  clairvoyant  detection  problem 
with  optimal  solution  given  in  terms  of  the  power  spectrum.  This  solution  was  applied 
to  the  nonstationary  detection  problem,  and  it  performance  was  examined  in  terms  of  the 
process  rate  of  variation.  It  was  shown  that  this  solution  provides  nearly  optimal 
performance  when  the  rate  of  variation  is  sufficiently  small.  Future  work  includes 
developing  techniques  to  treat  the  detection  problem  for  the  case  of  rapidly  varying 
nonstationary  processes. 
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ABSTRACT 

Impulsive  noise  described  by  an  alpha-stable  distribution  has  second-order  as  well  as  higher- 
order  moments  that  are  infinite.  For  this  noise,  estimates  of  conventional  parameters  such  as 
variance  and  power  spectra  are  not  consistent  and  can  appear  to  be  non-stationary.  The 
particular  type  of  impulsive  noise  examined  consists  of  complex  noise  samples  having  a 
bivariate  isotropic  alpha-stable  distribution.  The  complex-Gaussian  distribution  is  a  special 
case  where  the  envelope  is  Rayleigh  distributed  and,  only  for  this  case,  all  moments  exist  and 
are  consistent.  When  the  noise  is  highly  impulsive,  even  the  mean  or  first  moment  may  not 
exist.  Examples  of  moderately  impulsive  noise  include  sea  clutter,  the  radar  returns  from 
ocean  waves,  and  especially  high-resolution,  horizontally-polarized  returns.  Spectral  analysis 
of  this  noise,  such  as  Doppler  processing,  can  retain  the  impulsive  nature  of  the  noise  and 
make  the  spectral  estimates  inconsistent.  The  concept  of  positive  and  negative  lower-order 
moments  of  the  envelope  is  used  to  provide  robust  methods  for  obtaining  consistent  estimates 
in  the  presence  of  impulsive  noise. 


1.  Introduction 

7.7  General 

From  the  viewpoint  of  the  measurement,  detection  and  characterization  of  signals  and 
noise,  the  most  commonly  assumed  probability  distribution  function  is  probably  the 
Gaussian  distribution.  This  would  include  the  many  Gaussian  derived  distributions  such  as 
the  Rayleigh,  Chi-squared,  noncentral  Chi-squared,  Student  t,  F,  etc.  The  statistics  that 
describe  observed  physical  phenomena  are  defined  by  this  assumption,  and  the  signal 
processing  methods  to  obtain  parameter  estimates  and  test  statistics  are  derived  under  this 
assumption.  The  Gaussian  assumption  can  generally  be  justified  using  the  central  limit 
theorem’  where  the  physical  phenomena  being  studied  consists  of  sums  of  identically 
distributed,  statistically  independent  events  with  finite  means  and  variances.  The  “identically 
distributed”  requirement  can  be  dropped  if  moments  higher  than  the  variance  exist.  The 
Gaussian  assumption,  however,  is  inappropriate  when  the  means  and  variances  of  the 
independent  events  are  infinite. 
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When  these  moments  are  infinite,  the  generalized  central  limit  theorem^-^  states  that  the 
sums  converge  to  the  alpha-stable  distribution  of  which  the  Gaussian  distribution  is  a  subset. 
These  distributions  can  model  physical  phenomena  that  are  impulsive  or  spiky  in  nature. 
Such  phenomena,  for  example,  can  include  radio  static  from  lightning  storms,  as  well  as  high- 
resolution  radar  returns  from  sea  clutter.  Problems  begin  when  these  data  are  processed  using 
methods  based  on  conventional,  second-order  statistics  under  the  Gaussian  assumption. 

If  data  with  an  alpha-stable  distribution  are  processed  using  ordinary  second-order 
methods  such  as  variance,  correlation  or  power  spectrum  estimation,  then  the  results  can 
appear  to  be  non-robust,  inconsistent  or  nonstationary.  If  the  impulsive  nature  of  the  data  is 
recognized,  then  ad-hoc  processing  methods  such  as  clipping  or  the  removal  of  “outliers”  are 
used.  This  also  results  in  the  use  of  other  statistics  such  as  the  median  or  the  geometric  mean 
that  can  limit  the  description  or  characterization  of  the  data.  Empirically  derived  distributions 
such  as  log-normal  or  Weibull  are  often  fitted  to  the  data. 

An  example  of  inconsistent  mean-square  estimates  is  given  in  Fig.  1 .  The  top  trace  shows 
the  envelope  of  bin  #16  from  a  sea  clutter  Doppler  spectrum  (such  as  from  one  velocity  bin 
of  a  Doppler  spectrum)  as  a  function  of  sample  number  (proportional  to  time).  The  bottom 
traces  show  running  averages  for  estimates  of  the  root-mean-square  (RMS,  square-root  of  the 
second  moment)  and  the  fourth  power  of  the  negative  0.25-order  moment.  These  two 
statistics  are  proportional  to  amplitude.  The  running  average  for  the  RMS  shows  the 
inconsistent  behavior  associated  with  infinite  variance.  In  contrast,  the  negative  0.25  order 
moment  stabilizes  and  shows  little  fluctuation. 

When  the  signals  and  noise  from  a  physical  phenomena  are  recognized  to  be  alpha-stable 
distributed,  the  measurement,  detection  and  characterization  of  this  phenomena  can  be 
accomplished  using  methods  that  are  robust  and  give  consistent,  generalized  results.  This 
paper  presents  examples  where  the  theory  is  applied  to  radar  returns  from  sea  clutter  for 
detection,  estimation  of  moments  and  distribution  parameters,  and  Doppler  spectrum 
analysis. 


Fig.  1.  Example  of  Inconsistent  RMS  Estimate 


17 


1.2  Alpha-Stable  Distributions 

The  study  of  alpha-stable  distributions  and  their  application  to  signal  processing  has 
grown  significantly  over  the  past  several  years.  A  tutorial  review  of  the  basic  characteristics 
of  stable  distributions  and  stable  signal  processing  was  given  in  1993  by  Shao  and  Nikias'. 
This  review  was  followed  with  a  book  by  Nikias  and  Shao^,  published  in  1995,  that  provides 
an  introduction  to  stable  distributions  and  processes  and  that  emphasizes  the  application  of 
alpha-stable  concepts  to  signal  processing.  In  the  field  of  stochastic  modeling,  Samorodnitsky 
and  Taqqu'^  published  a  book  in  1994  that  targets  researchers  in  probability  as  well  as  applied 
probability  and  statistics. 

For  the  alpha-stable  distribution,  the  parameter  alpha,  a ,  the  characteristic  exponent, 
varies  over  0  <  a  <  2,  The  stable  distribution  includes  the  Gaussian  when  a  =  2.  As  a 
becomes  smaller  than  2,  the  random  process  becomes  more  impulsive  and  more  non- 
Gaussian  in  nature — ^the  tails  of  the  stable  distribution  become  thicker.  Hence,  the  stable 
distribution  is  an  attractive  choice  for  modeling  signals  and  noise  with  an  impulsive  nature. 
For  Cf  <  2 ,  the  random  process  has  infinite  variance  (including  moments  higher  than  second- 
order)  ;  however,  a  range  of  fractional  lower-order  and  negative-order  moments  exist  that  are 
consistent. 

From  the  generalized  central  limit  theorem,  the  stable  distribution  is  the  only  limiting 
distribution  for  sums  of  independent  and  identically  distributed  random  variables.  In  a  quote 
from  Shao  and  Nikias^:  “If  an  observed  signal  or  noise  can  be  thought  of  as  the  sum  or  results 
of  a  large  number  of  independent  and  identically  distributed  effects,  then  the  Generalized 
Central  Limit  Theorem  suggests  that  a  stable  model  may  be  appropriate.”  This  is  related  to 
the  stability  property  quoted  from  Shao  and  Nikias^:  “the  sum  of  two  independent  stable 
random  variables  with  the  same  characteristic  exponent  is  again  stable  and  has  the  same 
characteristic  exponent.” 

1.3  Radar  Returns  from  Sea  Clutter 

Stable  distributions  can  model  radar  returns  from  sea  clutter,  the  radar  reflections  from 
ocean  waves.  Radar  returns  are  generally  more  spiky  when  using  horizontal  polarization  for 
transmit  and  receive,  and  when  using  high  spatial  resolution.  The  time-varying  clutter  tends 
to  prevent  radar  detection  of  targets;  the  spiky  nature  of  the  clutter  tends  to  raise  the  false- 
alarm  rate.  Knowledge  of  the  distribution  can  result  in  a  detector  designs  that  are  tailored  to 
the  type  of  clutter.  In  Nohara  and  Haykin^  the  targets  of  interest  are  small  fragments  of 
icebergs. 

As  discussed  by  Nohara  and  Haykin^  and  Armstrong  and  Griffiths^,  the  K-distribution  is 
currently  a  widely  accepted  model  for  representing  the  amplitude  or  envelope  statistics  of  sea 
clutter.  The  distribution  is  based  on  the  assumption  that  the  radar  return  from  a  region  consists 
of  a  sum  of  independent  returns  (speckle)  that  vary  in  intensity  with  time.  The  I  and  Q  (in- 
phase  and  quadrature)  returns  from  sea  clutter  are  described^  as  a  complex  Gaussian  process 
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(short  decorrelation  time)  modulated  by  a  second  random  process  (long  decorrelation  time) 
that  is  chi-distributed.  The  modulation  results  in  spiky  radar  returns. 

Field  data,  however,  suggest  that  the  K-distribution  does  not  fully  explain  observed  sea 
clutter  distributions.  An  example  using  horizontal  (H-pol)  polarized  sea  clutter  is  presented 
later  in  this  paper.  The  K-distribution  is  shown  to  match  the  main  body  of  the  distribution; 
however,  it  fails  to  match  the  tail.  The  alpha-stable  distribution  is  shown  to  match  both. 


2.  Alpha-Stable  Envelope  and  Its  Moments 


The  first  applications  of  the  alpha-stable  distribution  are  related  to  the  envelope  and  its 
moments.  The  probability  density  function  for  the  envelope  of  bivariate  isotropic  stable 
distributed  noise^  is 

f{z)  =  Joiszjds  (1) 

were  the  noise  envelope,  z,  is  given  by 

z  =  ^/zf+4 

and  the  in-phase  and  quadrature  components  z^and  Zq  are  jointly  isotropic  alpha-stable  with 
dispersion,  y .  The  p'^  moment  is  defined  by 

mp  =  E[z'’]--l^z‘’p{z)dz  (3) 

For  the  M'*  variate  (for  the  bivariate  case,  M-2),  the  moment^  is 


r« 


(4) 


For  7  >  Oand  0  <  «  <  2,  then  -  M  <  p  <«.  For  the  Gaussian  case,  a  =  2,  -  M  <  p  <  oo^and 
the  variance  is  <7^  =  27 . 

The  moment  that  appears  to  be  particularly  useful  for  normalizing  the  envelopes  of  stable 
bivariate  distributions  is  the  negative  first-order  moment  (NFM) 


_r(i/Qf) 


ay 


\/a 


(5) 


which  can  be  calculated  from  estimates  of  a  and  7,  or  the  NFM  can  be  estimated  from 


1  ^  1 


(6) 


The  NFM  estimator  is  unbiased;  however,  as  shown  later,  its  variance  is  infinite.  Since  the 
envelope  density  function  is  scaled^  by  z/y’^" ,  the  envelope  samples  can  be  normalized  in 
amplitude  by  multiplying  the  samples  by  the  negative  first-order  moment  estimate.  This 
normalization  method  will  be  used  in  this  paper. 
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The  envelope  density  function  given  by  Eq.  (1)  only  has  a  closed  form  solution  for  the 
Gaussian  or  Rayleigh  (a  =  2)  and  the  isotropic  Cauchy  (05  =  1)  density  functions.  The 
Rayleigh  density^  is 

=  (7) 

2y 

and  the  isotropic  Cauchy  density^  is 


m= 


zr 


[e^f) 


3/2 


(8) 


Otherwise  the  density  function  is  calculated  by  numerical  integration  of  Eq.  (1).  Graphs  of 
the  density  function  normalized  by  the  NFM  are  given  in  Fig.  2. 

Methods  for  estimating  a  are  presented  in  the  literature^-^;  however,  for  the  envelope,  the 
graphs  of  the  density  function  in  Fig.  2  indicate  that  a  simple  approach  would  be  to  count  how 
often  the  envelope  exceeds  a  specified  level.  Since  a  “pivot”  exists  at  a  normalized  amplitude 
of  3.1,  setting  the  threshold  at  this  level  would  give  the  greatest  variation  in  the  number  of 
occurrences  for  different  alpha.  Fig.  3  gives  the  probability  of  exceeding  the  normalized 
amplitude  of  3.1  and  2.5.  These  probabilities  were  obtained  by  numerical  integration  of  the 
density  functions.  A  two-pass  operation  is  required  on  the  data:  the  NFM  is  estimated  first, 
followed  by  a  second  pass  to  count  the  number  of  times  the  levels  are  exceeded.  If  sufficient 


Fig.  2.  Envelope  Distribution  for  Bivariate  Isotropic  Stable  Distributed  Noise 


Alpha 

Fig.  3.  Probability  of  Exceeding  Normalized  Envelope  Levels  of  3.1  and  2.5 

samples  exists,  then  comparison  of  a  histogram  to  the  calculated  distribution  is  still  necessary 
to  verify  the  fit  to  the  tail  of  the  distribution. 

Next,  the  median  and  mode  are  numerically  calculated  from  the  envelope  density  and 
normalized  by  the  NFM.  As  shown  in  Fig.  3,  the  mode  or  peak  of  the  distribution  (the  most 
probable  value)  varies  almost  linearly  as  alpha  varies  from  1  to  2.  Of  greater  interest  is  the 
variation  of  the  median  with  alpha.  Since  the  median  is  nearly  constant  as  alpha  varies  from 
about  1 .4  to  2,  the  NFM  is  an  close  estimator  of  the  median  and  vice  versa.  The  median  equals 
1 .48  times  the  NFM.  The  circles  at  the  end  points  indicate  median  values  calculated  from  Eq. 
(7)  and  Eq.  (8). 

If  second-order  statistics  are  used  with  alpha-stable  distributed  data  where  saturation  or 
clipping  occurs,  then  the  second-order  statistics  can  still  give  arbitrary  results.  This  can  easily 
occur,  for  example,  in  a  radar  system  where  the  radio  frequency  (RF)  amplifiers  and/or  the 
analog-to-digital  (A/D)  converters  are  driven  into  saturation  by  large  spikes  from  the  clutter 
returns.  To  show  the  variation  in  root-mean-square  (RMS)  as  a  function  of  alpha  and  clipping 
level,  the  second  moment  (mean-square)  was  calculated  by  numerically  integrating  Eq.  (1). 
When  the  saturation  level  was  reached,  the  density  function  was  replaced  by  a  delta  function 
that  represented  the  remaining  area  under  the  density  function  beyond  the  clipping  level. 
Since  the  tail  no  longer  extends  to  infinity,  the  second  moment  is  finite.  The  results  of  this 
integration  are  presented  in  Fig.  5.  The  RMS  is  normalized  by  the  NFM  (assumed  to  be 
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Fig.  4.  Median  and  Mode  for  Envelope  of  Bivariate  Isotropic  Stable  Distribution 


known).  The  clipping  level  is  likewise  normalized  by  the  NFM.  Alpha  was  varied  from  1  to 
2  and  the  clipping  level  was  varied  from  1  to  20.  At  alpha  equal  to  2  and  no  clipping,  the  RMS 
is  4k.  For  alpha  equal  to  2,  clipping  levels  above  4  have  very  little  affect  on  the  RMS. 
However,  for  alpha  equal  to  1 ,  the  RMS  varies  from  2.5  to  6.5  as  the  clipping  level  varies  from 
4  to  20.  The  results  are  highly  sensitive  to  clipping  level  and  alpha. 

In  contrast,  when  the  negative  first-order  moment  is  calculated  using  numerical  integration 
and  clipping,  the  NFM  calculation  normalized  by  the  known  NFM  is  constant  at  unity  for  all 
clipping  levels  from  3  to  20.  The  NFM,  like  the  median,  is  not  very  affected  by  saturation. 

3.  Sea  Clutter  as  a  Stable  Process 

3. 1  Fit  to  Histogram 

Using  the  NFM  to  normalize  the  histograms  and  using  the  probability  of  exceeding 
method  to  estimate  a ,  an  example  of  fitting  sea  clutter  histograms  to  the  stable  envelope 
distribution  is  given  in  Fig.  5  for  horizontally  polarized  (H-Pol)  sea  clutter  returns.  The 
comparison  includes  the  K-distribution  that  was  also  normalized  by  the  negative  first-order 
moment.  The  sea  clutter  data  were  taken  in  a  sea  state  3  with  an  X-band  radar  at  8°  look-down 
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Fig.  5.  RMS  at  Different  Normalized  Clipping  Levels 


angle  with  a  spatial  resolution  of  1 .52  m.  This  represents  a  nominal  sea  condition.  The  clutter 
data  in  Fig,  5  had  an  exceeding  probability  of  0.109  which  is  a  =  1.75.  The  K-distribution 
was  fitted  using  the  shape  parameters  v=4.0.  The  K-distribution  closely  matches  the  main 
body  of  the  histogram  up  to  a  normalized  amplitude  of  4.  Beyond  this  amplitude,  the  K- 
distribution  diverges  significantly  and  does  not  match  the  tail  of  the  histogram.  The  stable 
distribution  gives  a  much  closer  match  to  the  entire  histogram;  however,  the  stable 
distribution  does  overestimate  the  tail.  The  stable  distribution  does  a  better  job  of  modelling 
the  impulsive  nature  of  the  H-Pol  clutter. 

3.2  Envelope  Detectors 

The  performance  of  several  envelope  detectors  are  examined  using  the  H-Pol  sea  clutter 
as  noise  or  interference.  The  target  is  simulated  as  a  sinewave  with  constant  amplitude,  A , 
with  unknown,  random  phase,  added  to  the  I  and  Q  signal.  For  radar,  this  is  equivalent 
to  a  non-fading  point  target  where  from  one  sample  to  the  next  the  phase  varies  randomly  with 
a  uniform  distribution  over  the  interval  [0,2;r) .  The  envelope  of  this  sum  is  the  input  to  the 
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Normalized  Amplitude 

Fig.  6.  Envelope  Distribution  for  H-Pol  Sea  Clutter  with  1.52  m  resolution 


detectors.  The  hypotheses  are 


=  V  (^In  +  ^  cos  f  +  [zq,  +  A  sin  )' 


(9) 


Except  for  normalization  by  the  NFM,  the  test  statistic  for  the  envelope  detector  is  given 
by  Whalen* 


m_i 


(10) 


The  negative  first-order  moment  (NFM)  detector  has  a  test  statistic  that  is  the  reciprocal  of 
the  NFM  estimator 


9.  NFM 


1  ^ 

-y 


(11) 


24 


Fig.  7.  Probability  of  Detection  Curves  for  Different  Detectors  at  PFA  =  10  '* 


The  clipper  is  identical  to  the  envelope  detector  with  the  envelope  clipped  at  level  C. 


^cUp  ~ 


m_j  j^n  ’  ^ 


(12) 


Usually  the  detector  is  normalized  by  the  noise-only  standard  deviation;  however,  since  this 
moment  does  not  exist,  the  test  statistics  are  normalized  by  the  negative  first-order  moment, 
m_, ,  for  the  noise-only  case.  Constant  False  Alarm  (CFAR)  schemes  would  be  based  on 
estimates  of  this  moment. 

The  results  from  applying  these  detectors  to  the  H-Pol  sea  clutter  are  shown  in  Fig.  7  for 
the  case  of  N  =  32  independent  samples  and  a  probability  of  false  alarm  (PFA)  of  10  '^.  The 
signal-to-noise  ratio  (SNR)  is  defined  as 


SNR  =  20\og{Am_^).  (13) 

For  Gaussian  distributed  noise,  m_,  is  related  to  standard  deviation,  a ,  by 

m_j  cr  =  (14) 

which  is  about  5  dB.  In  Fig.  7,  the  envelope  detector  in  Gaussian  noise  has  a  probability  of 
detection  (PD)  of  0.5  at  4  dB  for  PFA  of  lO  "*.  Because  of  the  5  dB  difference  in  the  definition 
of  SNR,  this  corresponds  to  -1  dB  for  the  curves  given  in  Whalen'  where  the  test  statistic  is 
normalized  by  standard  deviation.  The  envelope  detector  in  H-PoI  clutterhasPD  =  0.5  at  12.5 
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dB,  so  the  envelope  detector  suffers  an  8.5  dB  loss  in  this  H-Pol  clutter  (If  the  quadratic 
detector  were  used,  then  the  loss  would  be  closer  to  12  dB.)  The  NFM  detector  shows  a  4.5 
dB  improvement  over  the  envelope  detector  while  the  clipper  shows  a  5  dB  improvement. 
The  clipper  level  was  set  to 

C  =  3/m_,  (15) 

For  Gaussian  noise,  this  corresponds  to  1 .69(7 .  The  performance  of  this  detector  is  sensitive 
to  small  changes  in  clipper  level.  Because  of  the  clipping,  this  detector  may  not  work  at  very 
low  false  alarm  rates.  The  NFM  detector  does  not  have  these  two  problems,  but  it  does  have 
slightly  worse  performance. 

4.  Generalized  Cauchy  Envelope  Distribution 


Closed  form  solutions  for  the  negative  first-order  moment  of  a  sinewave  plus  noise  exist 
for  the  Gaussian  and  Cauchy  distributed  noise.  The  results  for  Gaussian  distributed  noise  are 
given  by  Whalen.  Similar  results  for  Cauchy  distributed  noise  are  given  below. 

The  envelope  of  a  sinewave  plus  Cauchy  distributed  noise  is  derived  from  the  bivariate 
isotropic  Cauchy  distribution^ 


where  7  is  the  dispersion.  For  the  n'^  sample,  the  transformation  is 

cos  (f)^  +  cos 

Ze„=z„sin(^„  +  4sin0„  (17) 

where  z„  is  the  envelope  of  the  Cauchy  noise  with  phase,  uniformly  distributed  over 
[0,2;r).  The  sinewave  amplitude,  A„,  and  phase,  6^,  can  vary  from  sample  to  sample.  The 
Jacobian  is  z„,  which  gives  (dropping  the  explicit  sampling  dependence) 


f(z,(l)\A,e)  = 


2n  (z^  +  -I-  7^  +2zAcos{(j)  ~  6)f 


f(z\A,e)  =  fJf{z,<l>\A,d). 


EU2b/{a  +  b 


f{z\A,e)  =  - 


n{a-b)(a^br 

where  a  =  z^  A^  +  and  b  =  2zA  and  £’(•)  is  the  complete  elliptic  integral  of  the  2nd 
kind.  The  density  is  independent  of  0,  so  the  density  function  of  the  envelope  of  a  sinewave 
plus  narrowband  Cauchy  noise  is 

/(z|a)  =  -  /^^TTTTTirV^ 


7i:{a-b){a  +  bf^ 


Eipb/{a  +  b 


Fig.  8.  Generalized  Cauchy  Envelope  Distribution 


To  this  point,  the  derivation  of  the  density  function  is  similar  to  that  given  by  Tsihrintzis  and 
Nikias^.  For  Gaussian  noise,  the  corresponding  density  function  is  referred  to  as  the  Rician 
or  generalized  Rayleigh,  so  this  density  will  be  referred  to  as  the  generalized  Cauchy 
envelope  distribution.  For  a  non-fading  signal,  the  amplitude,  A ,  is  taken  to  be  aconstant  over 
the  observed  samples. 

Next  the  density  function  will  be  normalized  by  the  negative  first-order  moment  (NFM) 
of  the  noise 

=  1/7 

Let  M  =  z  m_,  and  8  =  Am_^,  and  the  density  function  becomes 

where  now  a  =  u^  +8^  +  l  and  b  =  2uS .  This  density  is  shown  in  Fig.  8  for  different  values 
of  8. 

The  negative  first-order  moment  of  the  generalized  Cauchy  distribution,  or  generalized 
Cauchy  NFM  is  derived  by  integrating  according  to  the  following  order 


(22) 

(23) 


(24) 
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Fig.  9.  Inverse  Negative  First-Order  Moment  for  Generalized  Cauchy  (top)  and  Rician  (bottom) 


and  then  normalizing  to  get  the  closed-form  solution 


1 

^S^+l 


(25) 


The  inverse  of  the  NFM  is  shown  in  Fig.  3  for  both  the  Generalized  Cauchy  density  (top 
curve)  and  the  Rician  density  (bottom  curve).  The  NFM  for  the  Rician’  is  also  normalized  by 
the  noise-only  NFM.  The  inverse  NFM  calculated  using  simulated  Cauchy  noise  is  given  by 
squares;  the  inverse  NFM  using  simulated  Gaussian  noise  is  given  by  circles.  For  each  case, 
the  calculated  values  closely  match  the  estimated  values.  The  triangles  are  inverse  NFM 
estimates  from  H-Pol  sea  clutter  with  a  =  1.75.  The  sea  clutter  estimates  closely  match  those 
from  the  Rician. 


The  probability  density  function  of  the  phase  is  derived  for  a  given  sinewave  amplitude 
and  phase.  The  function  is 


where  after  manipulation  and  normalization,  the  Cauchy  Phase  PDF  is 


-hi  -  gcos(0-^) 
+1-5^  CO&^  (<j)~0) 


(26) 


(27) 
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5.  Statistical  Error  for  Lower-Order  Moments 


Fig.  lOshows  this  relationship  forthe  bivariate  case  as  alpha  is  varied  from  1  to  2.  As  the  noise 
becomes  more  impulsive,  the  negative  lower-order  moments  have  the  least  statistical  error. 

These  relationships  provide  a  way  for  estimating  alpha  and  gamma.  Since  gamma  is 
independent  of  moment  order  and  alpha,  a  series  of  moments  with  different  order  can  be 
calculated  from  the  same  set  of  samples .  Gamma  is  estimated  using  Eq.  (33)  as  alpha  is  varied. 
The  alpha  that  gives  a  constant  gamma  estimate  for  the  range  of  moment  order  is  then  the 
estimate  of  alpha.  The  moment  order  interval  can  be  selected  to  minimize  sampling  error .  The 
tails  of  the  distribution,  however,  are  not  checked  by  this  method. 


Fig.  10.  Statistical  Error  for  Gamma  Estimates  for  Alpha  =  1,  1.1,  1.2,  1.3, 1.4,  1.5, 1.6, 1.7,  1.8,  1.9,  1.95, 

1.96,  1.97, 1.98,  1.99,  1.995,  and  2 


The  statistics  of  short  sea  clutter  segments  can  be  examined  using  the  moment  method 
for  estimating  alpha  and  gamma.  Fig.  1 1  and  12  show  these  results  from  60  segments  for  H- 
pol  and  V-pol  sea  clutter  with  2. 1  m  resolution  in  sea  state  2.  Each  segment  is  0.64  seconds 
long  and  consists  of  400  samples  (correlated)  so  each  figure  presents  results  from  38.4 
seconds.  The  H-pol  and  V-pol  data  were  collected  on  alternate  samples,  and  the  data  are 
uncalibrated.  The  alpha  estimated  over  the  entire  data  set  is  1 .65  for  the  H-pol  and  1 .95  for 
the  V-pol.  Both  data  sets  show  significant  variation  in  both  alpha  and  gamma;  the  parameters 
are  time  varying.  As  expected,  the  H-pol  data  in  Fig.  1 1  shows  the  most  variation  in  both 
parameters,  and  the  V-pol  data  in  Fig.  12  has  an  alpha  very  close  to  2. 

6.  Spectrum  Analysis 

Forthenon-Gaussian  case  when  a  <  2,  estimates  of  the  auto-spectrum  become  inconsistent 
since  the  random  process  has  infinite  variance.  Estimating  the  auto-spectrum  using  the  Welch 
method*  is  the  same  as  taking  the  second-order  moment  of  the  envelope 
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Fig.  1 1 .  Variation  in  Alpha  and  Gamma  Over  0.64  Second  Segments 
for  Horizontally  Polarized  Sea  Clutter 
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Fig.  12.  Variation  in  Alpha  and  Gamma  Over  0.64  Second  Segments 
for  Vertically  Polarized  Sea  Clutter 


where 

segment. 


is  the  fast  Fourier  transform  (FFT)  of  the  n"'  windowed,  overlapping  data 

^  it=l 


(36) 
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where  (O.  =  27l{j  -  l)/i^Ar  and  =  (n  - 1)5.  Each  segment  is  shifted  5  samples  relative  to 
the  last  segment,  K  istheFFT  size,  K{kls.t)  is  the  window  function  and  At  is  the  time  between 
samples.  To  define  a  spectrum  whence  <  2,  the  approach  taken  is  to  estimate  alpha  and 
gamma  from  the  lower-order  moments  of  as  described  in  Sec.  5. 

The  time  history,  x{kAt) ,  can  either  be  a  real- valued  or  a  complex,  stable  random  process. 
For  either  case,  the  FFT  will  produce  bivariate  isotropic,  alpha-stable  distributed  samples  for 
each  data  segment.  For  sea  clutter,  the  data  are  complex. 

An  example  using  H-pol  sea  clutter  to  produce  a  Doppler  spectrum  is  given  in  Fig.  13. 
The  H-pol  clutter  was  taken  with  2.1  m  down-range  resolution  in  sea  state  2  with  a  sample 
rate  of  625  Hz.  An  artificial  target  was  added  to  bin  #16  (velocity  index  #16).  The  spectrum 
was  produced  from  373  segments,  each  128  samples  long  with  50%  overlap.  The  cosine  data 
window  was  used;  it  was  scaled  to  preserve  the  amplitude  of  the  sinusoid 

^Xk-(M«)  =  1  (37) 

^  *  =  ] 

Fig.  13  contains  four  overlaid  plots:  The  conventional  Doppler  spectrum  using  Eq.  (17)  is 
plotted  with  the  simulated  target  (open  circles)  and  without  the  target  (no  symbols).  The 
gamma  Doppler  spectrum  Eq.  (19),  is  plotted  with  the  target  (filled  circles)  and  without  the 
target  (no  symbols).  The  conventional  and  gamma  spectra  converge  when  a  =  2.  Bin  #16 
without  target  is  used  as  an  example  of  inconsistent  estimates  in  Fig.  1 .  The  gamma  spectrum 
in  dB  is  defined  as 

(38) 

The  target  is  not  seen  in  the  conventional  spectrum.  Fig.  14  shows  the  alpha  and  gamma 
spectra  without  the  target  present.  The  alpha  spectrum  shows  alpha's  around  1 .25  on  the 
flanks  of  the  gamma  spectrum.  The  clutter  Doppler  spectra  are  clearly  non-Gaussian.  For  bins 
less  that  -25  and  greater  than  +25,  the  radar  system  noise  with  a  =  2  dominates. 

7.  Conclusions 

For  high-resolution,  horizontally  polarized  sea  clutter,  the  alpha-stable  distribution  can 
give  a  much  better  fit  to  the  envelope  of  the  clutter  returns  than  the  generally  accepted  K- 
distribution.  Sea  clutter  is  used  as  an  example.  The  negative  first-order  moment  estimate  of 
the  envelope  has  various  applications:  estimate  alpha  for  fitting  stable  distributions,  normalize 
the  envelope  density  function  to  fit  distributions,  estimate  the  median,  normalize  envelope 
detectors,  detect  targets,  and  set  clipper  level  for  detectors.  The  generalized  Cauchy  envelope 
distribution  gives  the  distribution  and  the  negative  first-order  moment  for  a  sinewave  plus 
Cauchy  noise.  The  statistical  error  for  the  lower-order  moments  and  estimates  of  gamma  are 
developed.  These  lead  to  methods  for  estimating  alpha  and  gamma  that  have  application  to 
short  data  segments.  These  methods  are  shown  to  produce  gamma  and  alpha  spectra  that  are 
consistent  and  give  additional  insight  into  the  spectral  fluctuations  that  occur  with  time. 


33 


8.  Administrative  Information 

This  project  was  supported  by  the  Carderock  Division  of  the  Naval  Surface  Warfare 
Center's  In-house  Laboratory  Independent  Research  Program  sponsored  by  the  Office  of 
Naval  Research  and  administered  by  the  Research  Director,  Code  0112  under  Program 
Element  0601 152N  under  NSWCCD  Work  Unit  1-7340-504. 

9.  References 

1.  A.  D.  Whalen,  Detection  of  Signals  in  Noise,  (Academic  Press,  New  York,  1971). 

2.  M.  Shao  and  C.  L.  Nikias,  Signal  processing  with  fractional  lower  order  moments:  Stable 
processes  and  their  applications,  (IEEE  Proc.,  81,  July  1993)  pp.  986-1010. 

3 .  C.  L.  Nikias  and  M.  Shao,  Signal  Processing  with  a-Stable  Distributions  and  Applications, 
(John  Wiley  &  Sons,  New  York,  1995). 

4.  G.  Samorodnitsky  and  M.  Taqqu,  Stable  Non-Gaussian  Random  Processes:  Stochastic 
Models  with  Infinite  Variance,  (Chapman  &  Hall,  New  York,  1994). 

5.  T.  Nohara  and  S.  Haykin,  Canadian  East  Coast  radar  trials  and  the  K-distribution,  (lEE 
Proc.-F,  138,  no.  2,  April  1991)  pp.  80-88. 

6.  B.  Armstrong  and  H.  Griffiths,  CFAR  detection  of  fluctuating  targets  in  spatially 
correlated  K-distributed  clutter,  (lEE  Proc.-F,  138,  no.  2,  April  1991)  pp.  139-152. 

7.  G.A.  Tsihrintzis  and  C.L.  Nikias,  Incoherent  receivers  in  alpha-stable  impulsive  noise, 
(IEEE  Trans.  Signal  Processing,  43,  no.  9,  September  1995)  pp.  2225-2229. 

8.  P.  D.  Welch,  The  use  of  fast  Fourier  transform  for  the  estimation  of  power  spectra:  A 
method  based  on  time  averaging  over  short,  modified  periodograms,  (IEEE  Trans.  Audio 
Electroacoust.,  AU-15,  June  1967)  pp.  70-73. 


DETRENDING  TURBULENCE  TIME  SERIES  WITH  WAVELETS 


EDGAR  L  ANDREAS 

U.S.  Army  Cold  Regions  Research  and  Engineering  Laboratory 
72  Lyme  Road 

Hanover,  New  Hampshire  03755-1290,  U.S.A. 

E-Mai  1 :  eandreas  @  h anover-crrel  .army. m il 

and 

GEORGE  TREVINO 

Mechanical  Engineering  -  Engineering  Mechanics  Department 
Michigan  Technological  University 
Houghton,  Michigan  49931,  U.S.A. 

E-Mail;  gtrevino@mtu.edu 


ABSTRACT 


Wavelets  are  a  new  class  of  basis  functions  that  are  finding  wide  use  for  analyzing  and 
interpreting  turbulence  data.  Here  we  describe  a  new  use  for  wavelets:  identifying  trends 
in  turbulence  time  series.  The  general  turbulence  signal  we  consider  has  a  quadratic  trend. 

We  use  the  inverted  Haar  wavelet  and  the  elephant  wavelet,  respectively,  to  estimate  the 
first-order  and  second-order  coefficients  in  the  trend  polynomial.  Unlike  usual  wavelet 
applications,  however,  we  use  only  one  dilation  scale,  L,  where  L  is  the  total  length  of  the 
turbulence  series.  Our  analysis  shows  that  wavelet  trend  detection  is  roughly  half  as 
accurate  as  least-squares  trend  detection  when  accuracy  is  evaluated  in  terms  of  the  mean- 
square  error  in  estimates  of  the  first-order  and  second-order  trend  coefficients.  But 
wavelet  detection  is  more  than  twice  as  efficient  as  least-squares  detection  in  the  sense  that 
it  requires  fewer  than  half  the  number  of  floating  point  operations  of  least-squares 
regression  to  yield  the  three  coefficients  of  the  quadratic  trend  polynomial.  We  demon¬ 
strate  wavelet  trend  detection  first  with  artificial  data  and  then  with  various  data  collected 
in  the  atmospheric  surface  layer.  Lastly,  we  provide  guidelines  on  when  linear  and 
quadratic  trends  are  “significant”  enough  to  require  removal  from  turbulence  series. 

1.  Introduction 

Removing  the  trend  from  a  turbulence  series  is  an  essential  but  arcane  art.  Bendat  and 
Piersol  (1971,  p.  291)  and  Panofsky  and  Dutton  (1984,  p.  175),  for  example,  explain  that 
failure  to  remove  a  trend  may  result  in  distorted  spectra  or  correlation  functions,  especially 
at  low  frequencies  or  longer  time  lags.  Although  many  methods  have  been  used  for  trend 
removal  (e.g.,  Bendat  and  Piersol  1971,  288  ff.;  Priestley  1981,  p.  587  ff.;  Panofsky  and 
Dutton  1984,  p.  87  If.),  none  has  emerged  as  the  standard  method,  at  least  in  the  atmo- 
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spheric  turbulence  community  (Panofsky  and  Dutton  1984,  p.  175),  and  particularly 
for  nonstationary  turbulence. 

To  backtrack  for  a  moment,  our  working  definition  here  is  that  a  trend  is  any  component 
of  a  signal  with  a  period  longer  than  the  length  of  the  record.  In  essence,  a  series  with  a 
trend  is  nonstationary  (Jenkins  and  Watts  1968,  p.  151)  and  thus  an  appropriate  subject  for 
this  volume.  Although  we  focus  here  on  turbulence  time  series,  everything  we  say  applies 
directly  to  spatial  turbulence  series,  collected  by  aircraft,  for  example.  Simply  replace 
“time”  with  “distance,”  “frequency”  with  “wavenumber,”  and  “period”  with  “wavelength.” 

One  of  the  most  common  methods  for  detecting  trends  in  time  series  is  least-squares 
regression.  We  offer  here,  however,  an  alternative  to  least-squares  regression  based  on 
wavelets.  Wavelets  are  a  new  family  of  basis  functions  (Daubechies  1988,  Strang  1989, 
Farge  1992)  that  are  becoming  increasingly  popular  for  a  wide  range  of  meteorological 
applications  (e.g.,  Gao  and  Li  1993,  Collineau  and  Brunet  1993,  Mahrt  and  Gibson  1992, 
Mahrt  and  Howell  1994,  Katul  and  Parlange  1994,  Turner  and  Leclerc  1994).  We  show  how 
simple  it  is  to  use  wavelets  to  detect  trends  in  turbulence  time  series.  As  part  of  this 
demonstration,  we  develop  the  mathematics  for  using  wavelets  to  detect  first-order  and 
second-order  trends,  prove  that  wavelets  yield  unbiased  and  consistent  estimators  of  the 
trend,  offer  guidelines  on  when  it  is  necessary  to  remove  trends,  and  present  some  examples 
of  trend  detection  using  wavelets. 


Table  1.  Consider  two  scenarios:  a  turbulence  time  series  f(t)  with  a  linear  trend,  and 
a  turbulence  time  series  g(t)  with  a  quadratic  trend.  If  N  is  the  total  number  of  points 
in  the  time  series,  the  entries  show  how  many  floating-point  operations  are  necessary 
to  estimate  the  trend  polynomial  (the  |i’s)  with  our  wavelet  detection  scheme  and 
with  least-squares  regression.  The  wavelet  operation  counts  come  from  the  discrete 
forms  of  the  expressions  for  p,,  and  pj  developed  in  Section  3.  The  least- 
squares  operation  counts  come  simply  from  counting  operations  in  Bevington’s 
(1969,  p.  99  ff.  and  p.  137)  expressions  for  Pqls’  h,LS’  1^2, ls’ 
recognize  that  sums  of  the  first,  second,  third,  and  fourth  powers  of  h  =  Ai  are 
summable  series  requiring  far  fewer  than  N  operations. 


Linear:  f{t)  =  f(t)  +  Po  +  Pit 

Wavelets — 

2N  +  8 

Least-Squares  Regression — 

3N  +  14 

Quadratic:  g(t)  =  g(t)  +  Po  +  pp  + 

P2t2 

Wavelets — 

3N  +  23 

Least-Squares  Regression — 

6N  +  73 
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Since  least-squares  regression  is  the  primary  alternative  to  our  proposed  wavelet  trend- 
detection  scheme,  we  also  examine  the  expected  accuracy  of  both  least-squares  regression 
and  wavelet  detection.  Least-squares  regression  proves  to  be  better  than  wavelet  detection 
in  terms  of  the  mean-square  error  in  the  estimated  coefficients  of  the  polynomial  trend,  but 
wavelet  detection  has  a  computational  advantage.  Table  1  gives  the  bottom-line  benefits  of 
wavelet  detection.  If  N  is  the  number  of  points  in  a  turbulence  time  series,  wavelet  detection 
requires  roughly  N  fewer  floating-point  operations  than  least-squares  regression  for  detect¬ 
ing  a  linear  trend.  In  testing  for  a  quadratic  trend,  wavelet  detection  requires  less  than  half 
as  many  operations  and  is,  therefore,  more  than  twice  as  fast  as  least-squares  regression. 

2.  Mathematical  Foundation 

Because  we  want  to  detect  trends  up  to  second  order,  we  use  two  wavelets  in  our  trend- 
detection  scheme,  the  inverted  Haar  wavelet  (Figure  1)  and  the  elephant  wavelet  (Figure  2). 
The  inverted  Haar  wavelet  at  time  t,  I(t,  L),  is 


A-il(t,L)  =-l 

if  -L/2  <  t  <  0 , 

(2.1a) 

=  1 

if  0  <  t  <  L/2 , 

(2.1b) 

=  0 

elsewhere. 

(2.1c) 

Fig.  1.  Inverted  Haar  wavelet. 


38 


Fig.  2  Elephant  wavelet. 


Here 

A  =  f 

'^1 

2 

(2.2) 

where  L  is  the  dilation  scale  of  the  wavelet — 

the  length  of  time  it  encompasses.  The 

elephant  wavelet,  P(t,L),  is 

B“ip(t,L)  =  1 

if  -L/2  <  t  <  -L/6  , 

(2.3a) 

=  -2 

if  -L/6  <  t  <  L/6 , 

(2.3b) 

=  1 

if  L/6  <t  <  L/2, 

(2.3c) 

=  0 

elsewhere. 

(2.3d) 

where 

1 

fs’’ 

B  =  - 

r 

■ 

(2.4) 

Although  this  is  just  an  inverted  French  hat  wavelet,  we  prefer  the  less  cumbersome 
appellation  elephant  wavelet  because  the  wavelet  looks  like  an  elephant  to  us:  it  has  a  long 
trunk  and  two  big  ears. 
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Admittedly,  we  are  not  being  strictly  rigorous  in  referring  to  Eq.  (2.1)  and  (2.3)  as 
wavelets.  First,  we  use  only  one  dilation  scale  in  our  trend-detection  scheme,  L,  where 
L  =  AN  is  the  total  length  of  the  time  series  and  A  is  the  sampling  interval.  Second,  we 
ignore  energy  conservation  considerations,  which  would  dictate  other  values  for  A  and  B 
(Farge  1992).  But,  by  Daubechies’s  (1988)  definition  (also  Gamage  and  Hagelberg  1993), 
to  be  termed  a  wavelet,  a  function  need  only  have  compact  support  (be  nonzero  only  over  a 
finite  domain)  and  satisfy  an  admissibility  condition  (have  zero  mean),  which  Eq.  (2.1) 
and  (2.3)  do. 

Although  we  first  deal  with  Eq.  (2.1)  and  (2.3)  as  continuous  functions,  for  practical 
time  series  analysis  we  must  use  the  discrete  forms.  For  the  inverted  Haar  wavelet,  the 
discrete  form  is 


[f)\m  = 

-1 

for  1  <  i  <  N/2 , 

(2.5a) 

= 

1 

for  N/2  +1  <  i  <  N , 

(2.5b) 

= 

0 

otherwise. 

(2.5c) 

For  the  elephant  wavelet,  it  is 

2(fJp(i.N)  = 

1 

for  1  <i  <  N/3, 

(2.6a) 

= 

-2 

for  N/3  +  1  <  i  <  2N/3  , 

(2.6b) 

= 

1 

for  2N/3  +  1  <  i  <  N , 

(2.6c) 

0 

otherwise. 

(2.6d) 

Because  in  our  wavelet-based  detrending  scheme  we  ultimately  apply  both  Eq.  (2.5)  and 
(2.6)  to  the  same  time  series,  N  must  be  a  multiple  of  six  to  facilitate  segmenting  the  series. 


3.  Accuracy  Considerations 

3.L  Unbiased  Estimators 

Because  the  advantage  that  wavelet  trend  detection  has  over  least-squares  regression  is 
in  its  efficiency  in  detecting  quadratic  trends  (see  Table  1),  the  general  turbulence  signal  we 
consider  is 


g(t)  =  g(t)  +  ^lo+^lt  +  ^l2t^. 


(3.1) 
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Here,  t  is  time;  g  is  the  instantaneous  (measured)  value  of  the  time  series;  g  is  the  (zero- 
mean)  turbulence  component;  and  [Iq,  lU-i,  and  \i2  are  the  coefficients  of  the  quadratic  trend 
polynomial..  The  g  series  is  defined  (measured)  only  for  0  <  t  <  L.  The  purpose  of  trend 
detection  is  to  estimate  iIq,  M-i,  and  |i2  and  then  to  isolate  the  turbulence  signal  g(t).  All  that 
follows  could  also  be  adapted  to  a  series  with  a  linear  trend  by  setting  ]X2  =  0. 

In  general,  convolving  a  wavelet  with  a  time  series  defines  a  wavelet  coefficient  (e.g., 
Farge  1992).  Specifically,  convolving  g(t)  with  the  elephant  wavelet  yields  the  wavelet 
coefficient 


X(fL)  =  L”  g(s)P(s-t,L)ds  . 


(3.2) 


Using  Eq.  (2.3)  gives 


X(L/2,L)  =  Bj“i(s)ds  -  2815*^^  g(s)ds  +  i(s)ds  (3.3) 

since  P(t,L)  is  zero  outside  [0,L].  Notice,  in  most  wavelet  analyses,  the  support  of  the 
wavelet  L  is  much  shorter  than  the  length  of  the  time  series.  Thus,  moving  the  wavelet 
through  the  time  series  according  to  Eq.  (3.2)  would  yield  a  plethora  of  wavelet  coefficients 
indexed  to  t  and  L.  But  in  our  analysis,  Eq.  (3.3)  yields  only  one  %  value,  the  one  at  time 
L/2,  since  the  length  of  P  exactly  matches  the  length  of  g . 

With  two  changes  of  variables,  Eq.  (3.3)  becomes 


L/3 


X(L/2,L)  =  BJo  ds 


,  ^  2L 

g(s)  -  2g  s+-  +  g  s+— 


(3.4) 


Substitute  Eq.  (3.1)  in  for  g  here.  The  result  is 


L/3 


X(L/2,L)  =  BJo  ds 


g(s)  -  2g|s+^|  +  g|s+^ 


+  p,2. 


(3.5) 


Since  g(t)  has  zero-mean,  the  integral  in  Eq,  (3.5)  should  be  near  zero.  Thus,  x(L/2,L)  is 
an  estimator  of  1X2;  call  it  (12.  Formally,  we  can  take  the  expectation  of  |i2,  E[|i2L  to 
confirm  this; 


E[ti2]  =  BE 


,L/3, 

Jo  ds 


/  ^  n  I  f  2L 

g(s)  -  2g|s+-  +  g  s+— 

3  y  V  3 


+  E[h2].  (3.6) 


Since  E[g(t)]  =  0  by  definition, 

E[B2]  =  1^2. 

That  is,  1I2  =  x(L/2,L)  is  an  unbiased  estimator  of  \i2- 


(3.7) 
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Notice,  the  integrand  in  Eq.  (3.4)  resembles  the  second-difference  operator  (Haltiner 
and  Williams  1980,  p.  109;  Press  et  al.  1994,  p.  161),  Therefore,  it  is  not  surprising  that 
operating  on  a  time  series  with  the  elephant  wavelet  yields  a  coefficient  that  estimates  the 
second-order  trend  in  the  time  series. 

Next,  convolve  g(t)  with  the  inverted  Haar  wavelet.  This  yields  the  wavelet  coeffi¬ 
cient  X, 

X(t,L)  =  J_”  g(s)I(s-t,L)ds.  (3.8) 

Using  Eq.  (2.1)  and  transforming  variables  gives 

X(L/2,L)  =  g[s+|]  -  i(s)  •  (3,9) 

Substituting  Eq.  (3.1)  for  g  and  doing  some  algebra  and  two  integrations  yields 

X(L/2,L)  =  Aj^ds  -  g(s)  +  IX1+ILI2L.  (3.10) 

Consequently,  since  the  integral  should  be  near  zero,  X(L/2,L)  combined  with  Eq.  (3.4) 
provides  an  estimate  of  p,j — call  it  |ii — where 

fli  =  X(L/2,L)-|l2L.  (3.11) 

Or,  substituting  Eq.  (3.10)  in  (3.11), 

Al  =  M-i  +  Aj^^^^ds  -  g(s)  -  (^2  -  it2)L.  (3.12) 

To  formally  establish  the  validity  of  this  estimator,  take  the  expected  value  of  Eq.  (3.12); 

E[|i,]  =  E[n,]  +  Ajy^dsE  g(^s+|j  -  g(s)  -  LE[fl2  -  M.2].  (3-13) 
or 

E[|l,]  =  (3.14) 

since  g  has  zero  mean  and  because  {12  is  an  unbiased  estimator  of  \i2-  Consequently,  p-i  is 
an  unbiased  estimator  of  p-j. 

Notice  that  the  integrand  in  Eq.  (3.9)  resembles  the  forward-difference  or  first-differ¬ 
ence  operator  (Haltiner  and  Williams  1980,  p.  109;  Press  et  al.  1994,  p.  161).  Thus,  it  is 
clear  why  operating  on  a  series  with  the  inverted  Haar  wavelet  yields  a  quantity  related  to 
the  coefficient  of  the  linear  trend  in  the  time  series. 
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In  essence,  Eq.  (3.9)  shows  that  we  can  estimate  the  linear  trend  in  a  time  series  by 
simply  subtracting  the  average  of  the  first  half  of  the  series  from  the  average  of  the  second 
half.  Bendat  and  Piersol  (1971,  p.  289)  and  Trevino  (1982)  present  similar  methods  for 
estimating  the  slope  of  a  linear  trend,  although  they  use  only  portions  of  the  time  series,  not 
the  entire  time  series  as  we  do. 

The  sample  average  of  g  is 

I  =  ii„‘'g(s)ds.  (3.15) 

Substituting  Eq.  (3.1)  for  g,  we  get 

g  =  ^Cg(s)ds  +  \io  + 

Again,  because  g  should  experimentally  have  zero  mean,  an  estimator  for  po  is 

llo  =  I  -  ill,L  -  (3.17) 

Or,  from  Eq.  (3.16), 

Ao  =  llo  +  ^Cg(s)ds  -  tL(A,-lii)  -  .iL^(A2-li2).  (3.18) 

On  taking  the  expected  value  of  this,  we  see 

E[Ao]  =  E[no]  +  E[g(s)]ds  -  iLE[A,-ll,]  -  ^L^E[vL2-Vi2]-  (3.19) 

Thus, 

E[Ao]  “  M-O’  (3.20) 

since  g  has  zero  mean  and  |i|  and  (1,2  are  unbiased  estimators  of  pj  and  P2,  respectively. 
That  is,  our  wavelet-derived  estimate  of  pg  is  also  an  unbiased  estimator. 


3.2.  Consistent  Estimators 


A  statistic  for  evaluating  the  accuracy  of  the  estimators  we  defined  in  the  last  section  is 
the  mean-square  error  (Trevino  1982)  or  error  variance  (Lenschow  et  al.  1994).  With  P2  as 
an  example,  the  mean-square-error  (mse)  operator  is  defined  as 


mse[A2] 


(3.21) 
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In  Appendix  A,  we  derive  the  mse  of  each  estimator  that  we  defined  in  the  last  section. 
These  are  all  functions  of  the  integral  time  scale  3,  which  is  defined  as 


3  =  0g2j“Cg(i:)dT. 

(3.22) 

Here,  Gg  is  the  population  variance  of  the  g  time  series,  and  Cg 

is  the  autocorrelation  of  g. 

Cg(t)  =  E[g(t)g(t  +  t)], 

(3.23) 

where  T  is  the  time  lag.  We  also  recognize  Cg(T)/ag  as  the  normalized  autocorrelation 
function. 

In  Appendix  A  we  show  that 

MSE[Ao]  =  21.25 

(3.24a) 

MSE[A.]  =  f  (gc|, 

(3.24b) 

MSE[(1,]  = 

(3.24c) 

If  3  is  finite  and  much  smaller  than  the  total  series  length  L,  jio,  |li,  and  ^2 
consistent  estimators:  The  mse  of  each  goes  to  zero  as  the  series  length  (number  of 
samples)  increases. 

Because,  from  our  point  of  view,  least-squares  regression  is  the  primary  alternative  to 
our  wavelet-based  trend  detection,  in  Appendix  B  we  likewise  investigate  the  statistical 
properties  of  the  least-squares  estimators  of  p.Q,  lij,  and  \i2.  Call  these  estimators  jloLS’ 
[ii  Ls,  and  p.2,LS-  In  Appendix  B,  we  show  that — as  with  fio,  fii,  and  p,2 — Ao,lS’  M-i.lS’ 
and  jl2,LS  unbiased  estimators  of  iXq,  fti,  and  \i2,  respectively.  We  also  find  that 

MSE[Ao,ls]  =  (3-25a) 

MSE[A,,us]  =  (3.25b) 

MSE[|l2,Ls]  =  (3.25c) 
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That  is,  the  least-squares  estimators  all  provide  consistent  estimates  of  the  quantities  they 
are  intended  to  predict.  On  comparing  Eq.  (3.24)  and  (3.25),  we  see  that  MSE[|Io,ls] 
somewhat  less  than  MSE[jIo].  The  mse’s  of  the  other  two  least-squares  estimators  are 
roughly  half  as  large  as  the  mse’s  of  the  corresponding  wavelet-based  estimators.  Thus,  in 
the  mse  sense,  least-squares  trend  detection  is  better  than  wavelet-based  trend  detection. 
But  remember  Table  1;  this  factor  of  two  increase  in  accuracy  comes  at  the  expense  of  a 
factor  of  two  increase  in  computing  operations. 

4.  Tests  with  Artificial  Data 

To  test  our  wavelet-based  trend  detection  scheme  and  to  verify  the  statistical  properties 
of  the  estimators  that  we  derived  in  the  last  section,  we  constructed  some  artificial  time 
series  with  known  quadratic  trends.  Each  time  series  is  of  the  form  Eq.  (3.1),  where  we  have 
specified  p,o,  P-i,  and  |X2-  We  generated  g(t)  by  randomly  sampling  from  a  Gaussian 
probability  distribution  that  had  zero  mean  and  specified  standard  deviation  0g. 

For  each  set  of  ilIq,  |Xi,  and  ]l2,  we  created  40  artificial  time  series.  The  first  20  series 
each  consisted  of  3000  g  values  (this  is  N)  but  differed  because  the  sequence  of  random  g 
values  differed.  The  second  20  time  series  consisted  of  12,000  g  values  so  we  could  see 
how  quadrupling  L  affects  iLto,  Ai>  1^2  Eq.  (3.24)]. 

Figure  3  shows  a  typical  12,000-point  time  series  with  our  first  quadratic  trend.  A 
typical  3000-point  time  series  with  the  same  trend  would  just  be  the  left  one-quarter  of  this 
figure.  The  units  are  unimportant  here  since  g  could  really  be  any  quantity  and  t  in  Eq.  (3.1) 
could  be  replaced  with  i,  the  index  of  the  time  step. 

Figure  4  shows  our  wavelet-based  estimates  of  p-o,  Pi,  and  1X2  for  each  run  in  the 
twenty  3000-point  series  and  in  the  twenty  12,000-point  series.  After  we  computed  Po,  Pi, 
and  P2  for  each  artificial  series,  we  detrended  g  and  then  computed  the  sample  standard 
deviation,  s,  of  the  remaining  g  series.  These  s  values  also  appear  in  Figure  4. 

The  variability  of  the  estimates  in  Figure  4,  especially  in  the  3000-point  panels,  shows 
how  the  sequence  of  data  can  lead  to  random  scatter.  The  estimates  are,  however,  evenly 
scattered  both  above  and  below  the  value  being  estimated.  Thus,  our  analysis  of  these 
artificial  data  supports  our  prediction  that  po,  Pi,  and  112  are  unbiased  estimators. 

The  startling  differences  between  the  3000-point  and  12,000-point  panels  in  Figure  4  is 
the  record-length  effect.  We  showed  in  Eq.  (3.24)  that  Po,  Pi,  and  P2  should  be  consistent 
estimators.  The  differences  between  the  left  and  right  panels  confirms  this;  Po,  pj,  and  P2 
all  are  much  closer  to  the  true  values  of  po,  pi,  and  p2  when  the  record  length  is  four  times 
longer.  Even  s  is  a  better  estimate  of  the  true  population  standard  deviation  with  the  longer 
record  because  the  better  estimates  of  po,  pi,  and  P2  facilitate  more  accurate  trend  removal 
and,  consequently,  a  better  estimate  of  the  terms  in  the  g  series. 

Figures  5  and  6  repeat  this  story  using  artificial  data  with  stronger  first-order  and 
second-order  trends.  Again,  the  left  one-quarter  of  Figure  5  is  typical  of  the  time  series  used 
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Second-Order  Test  Function  Std.  Dev.  =  0.500 
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Fig.  3.  Typical  artificial  time  series  generated  with  Og  =  0.500, 
|io  =  3.00,  =  -2.00  X  10-3,  and  =  1.50  x  lO""^. 


3000  Points  12,000  Points 


Run  Run 

Fig.  4.  Wavelet-based  estimators  jio,  |i,,  and  jlj  for  twenty  3000-point  series  and  for  twenty  12,000-point 
random  series,  each  with  the  quadratic  trend  depicted  in  Fig.  3.  In  the  lowest  panel,  s  is  the  sample  standard 
deviation  of  the  turbulence  signal  (g)  computed  using  Pp,  Pi,  and  P2  to  remove  the  trend  from  the  g  series. 
In  each  panel,  the  horizontal  line  is  the  prescribed  value  of  the  quantity  being  estimated. 
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for  the  3000-point  analyses  depicted  in  Figure  6.  By  eye  it  is  difficult  to  see  a  second-order 
trend  in  this  portion  of  the  time  series;  but  the  wavelet  analysis,  nevertheless,  can  still 
estimate  \i2  to  within  about  ±10%  from  these  3000  points  (see  1I2  panel  on  left  side  of 
Figure  6). 

In  comparing  the  left  and  right  panels  of  Figure  6,  we  see  results  as  in  Figure  4.  p-Q,  Pi, 
and  P2  are  unbiased  estimators  of  po,  pi,  and  p2,  respectively.  And  all  are  consistent 
estimators:  with  the  record  length  increased  by  a  factor  of  four,  all  are  much  nearer  to  the 
values  they  are  intended  to  estimate.  The  sample  standard  deviation,  s,  also  estimates  Gg 
better  for  the  longer  time  series  because  the  better  estimates  of  Po,  Pi,  and  P2  provide 
more  aeeurate  trend  removal  and  thus  a  better  determination  of  the  g  series. 

5.  Criteria  for  Removing  TVends 

As  we  mentioned,  many  sources  discuss  techniques  for  trend  detection,  and  most 
explain  the  necessity  for  removing  “significant”  trends.  Most  also,  however,  fail  to  define 
“signifieant.”  We  feel  that  the  trend-detection  problem  really  has  two  parts:  (1)  estimate  the 
eoefficients  of  the  polynomial  trend;  (2)  decide  whether  to  remove  this  trend  from  the  time 
series.  Without  addressing  part  2,  a  trend-deteetion  scheme  is  incomplete.  Hence,  here  we 
provide  some  quantitative  eriteria  for  deciding  when  to  remove  an  estimated  trend. 

Consider  the  discrete  version  of  the  time  series  in  Eq.  (3.1), 

ii  =  gi  +  fio  +  (5.1) 

where  i  is  the  sample  index,  1  <  i  <  N,  and  A  is  the  sampling  interval.  For  example,  with  a 

sonic  anemometer/thermometer  operating  in  the  atmospheric  surface  layer,  A  might  be 
0.1  second. 

By  analogy  with  Eq.  (3.15),  the  sample  average  of  g  in  discrete  form  is 

1  N 

i  =  ~Igi.  (5.2) 

Ni=i 

or 

g  =  ^.2  [gi  +  4o  +  ^lI'A  +  (5.3) 

We  can  easily  sum  these  i  series  to  get 

I  =  ^  £  gi  +  |io  +  ^(N  +  1)  +  ^(N  +  I)(2N  +  1)  ,  (5.4) 

M i=i  2  6 


which  is  analogous  to  Eq.  (3.16). 


48 


By  definition,  the  unbiased  sample  variance  of  g  is 

1  N  /-  r\2 


4  = 


(5.5) 


From  Eq.  (5.1)  and  (5.4),  this  is 


s?  =  —  1 
g  N-1  i.i 


^  1  N  ^ 

Si  “  ^  Sk 

N  k=l 


l2 


.  (5.6) 


On  multiplying  this  out  and  taking  the  expected  value,  we  get 

-i(N  +  l)  +  |(N  +  lf] 
1 


c3  =  ctg  + 
g  g  N- 


+  2[i^\i2A^ 


i^  -  — (N  +  1)  -  -(N  +  1)(2N  +  1)  +  — (N  +  if  (2N  +  l) 
2  6  12 


+  HjA" 

Summing  the  i  series  finally  yields 


i4  _  |(N  +  1)(2N  +  1)  +  ^(N  +  1)"(2N  +  if 


(5.7) 


a?  =  Cfg  +  A^N(N+l)r^g^2  ^  30h,h3A(n  +  1)  +  n^A^(2N  +  l)(8N+ll)],  (5.8) 
g  180 

where  a-  and  0g  are,  respectively,  the  population  variances  of  the  g  and  g  series. 
Equation  (5.8)  establishes  how  badly  a  series  with  a  trend  misrepresents  the  variance  of  the 
actual  turbulence  signal. 

Suppose  the  instrument  we  are  using  to  sample  g  has  a  resolution  r.  For  example,  sonic 
anemometer/themometers  made  by  Applied  Technologies  Inc.  (ATI;  Boulder,  Colorado) 
provide  a  digital  output  reported  to  the  nearest  cm/s  for  components  of  the  wind  vector  and 
to  the  nearest  0.0 1°C  for  temperature.  For  such  instruments,  r  would  then  presumably  be 
1  cm/s  for  wind  speed  measurements  and  0.0 rC  for  temperature  measurements.  Since  we 
cannot  measure  Og  with  more  precision  than  our  instrument  provides,  we  cannot  confi¬ 
dently  identify  a  trend  in  our  data  unless 


(5.9) 
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From  Eq.  (5.8)  this  criterion  requires  that  we  detrend  our  data  only  when 


+  30A,  A2A(N+1)  +  A2A'(2N+1)(8N+1  1)] 

loU 


=  >  r2. 


(5.10) 


If  we  suspect  that  our  time  series  has  only  a  linear  trend,  not  a  quadratic  one,  the  criterion 
for  detrending  is  [from  Eq.  (5.8)  with  |i2  =  0] 


12 


>  r^ 


(5.11) 


In  summary — and  this  is  valid  for  any  trend-detection  scheme,  not  just  wavelet-based 
detection — once  we  have  computed  jig,  jli,  and  \L2,  we  can  evaluate  and 
whichever  we  feel  is  appropriate.  If  the  result  is  larger  than  the  square  of  our  presumed 
sensor  resolution,  we  must  detrend  the  g  series  before  computing  turbulence  statistics  to 
avoid  a  “significant”  bias  in  the  values. 


6.  Examples  with  Real  Data 

We  have  several  data  sets  available  with  which  to  demonstrate  wavelet-based  trend 
detection.  In  August  1991,  we  collected  over  60  hours  of  turbulence  data  with  an  ATI  sonic 
anemometer/thermometer  at  the  Sevilleta  Long-Term  Ecological  Refuge,  a  semi-arid  grass¬ 
land  near  Socorro,  New  Mexico  (Andreas  1995).  The  sonic  was  positioned  4  m  above  the 
surface  and  sampled  at  10  Hz  (i.e.,  A  =  0.1  s).  We  will  use  two  short  time  series  from  this 
collection  as  our  first  examples.  Both  series  come  from  the  evening  transition,  when  we 
would  expect  air  temperature  to  begin  falling.  Hence,  we  look  at  sections  of  the  sonic 
temperature  record. 

Figure  7  shows  300  s  of  sonic  temperature  data  collected  at  about  8  p.m.  local  time.  The 
measured  temperature  decreased  during  this  interval  but  only  by  about  0.3°C.  Our  wavelet 
analysis,  as  described  in  Section  3,  yields  jig  -  21.27®C,  p,^  =  -7.126  x  10“^  '^C/s,  and 
\L2  =  -3.3 13  X  10“^  °C/s^  for  this  series.  From  Eq.  (5.10),  D2  is  then  0.09°C.  If  we  take  the 
resolution  of  the  sonic  thermometer  to  be  0.01  °C,  the  reporting  increment  of  the  instru¬ 
ment,  Eq.  (5.10)  requires  that  we  remove  the  trend.  Figure  7  also  shows  the  quadratic  trend 
that  we  removed  and  the  detrended  series.  The  original  series  has  a  standard  deviation  of 
0.14°C;  the  detrended  series,  0.12°C.  Thus,  we  would  have  overestimated  the  standard 
deviation  of  this  series  by  17%  if  we  had  not  detrended. 

Figure  8  demonstrates  why  detrending  is  necessary.  In  it  we  plot  the  autocorrelation 
functions  for  the  original  and  detrended  series  in  Figure  7.  Clearly,  the  autocorrelation 
function  of  the  original  series  does  not  approach  zero — even  with  a  lag  of  20  s — as  we 
would  expect  of  true  surface-layer  turbulence.  As  a  result,  the  integral  scale  [see  Eq.  (3.22)] 


Time  Step 

Fig.  7.  Three  hundred  seconds  of  sonic  thermometer  data  col¬ 
lected  near  8  p.m.  on  4  August  1991  at  the  Sevilleta Long-Term 
Ecological  Refuge  (top),  the  quadratic  trend  that  we  identified  and 
removed  (middle),  and  the  detrended  series  (bottom).  Each  time  step 
is  0.1  s. 
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Fig.  8.  Autocorrelation  functions  computed  for  the  original  and 
detrended  series  in  Fig.  7.  Integral  scales  of  the  original  and 
detrended  series  are  7.5  and  0.79  s,  respectively. 


of  the  original  time  series  is  7.5  s;  while,  for  the  detrended  series,  it  is  an  order  of  magnitude 
shorter,  0.79  s. 

Figure  9  shows  a  second  Sevilleta  sonic  temperature  series  collected  during  another 
evening  transition.  Again,  the  original  temperature  series  seems  to  be  generally  de¬ 
creasing.  Our  wavelet  analysis  yields  Po  =  25.25°C,  Pj  =  2.003  x  10“^  °C/s,  and 
P2  =  -8.684  X  10“^  °C/s^.  From  Eq.  (5.10),  D2  is  0.08°C.  Again,  this  is  larger  than  the 
resolution  in  our  temperature  measurement,  0.01  °C,  so  we  must  detrend.  Figure  9  also 
shows  that  our  detrending  really  levels  the  time  series,  which  first  increased  in  the  mean 
and  then  decreased  (Figure  9,  middle  panel). 

The  standard  deviation  of  the  original  time  series  in  Figure  9  is  0.12°C;  for  the 
detrended  series,  it  is  0.09°C.  Therefore,  using  the  original  series  to  compute  the  standard 
deviation  in  temperature  would  have  produced  a  value  33%  higher  than  in  the  actual 
turbulence  signal. 

Lastly,  the  autocorrelation  functions  (Figure  10)  of  the  original  and  detrended  series 
shown  in  Figure  9  again  emphasize  the  importance  of  detrending.  The  original  series  yields 
an  autocorrelation  function  that  goes  to  zero  much  more  slowly  than  the  one  for  the 
detrended  series.  From  Figure  10  we  compute  the  integral  scale  of  the  original  series  to  be 
8.1  s,  while  the  integral  scale  of  the  detrended  series  is  only  3.3  s. 
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Fig.  10.  Autocorrelation  functions  computed  for  the  original  and 
detrended  series  in  Fig.  9.  Integral  scales  for  the  original  and 
detrended  series  are  8.1  and  3.3  s,  respectively. 


A  second  data  set  to  which  we  apply  our  trend-detection  scheme  comes  from  Ice 
Station  Weddell  (ISW),  a  camp  floating  on  the  sea  ice  in  the  western  Weddell  Sea.  On  ISW 
we  collected  over  2000  hours  of  data  with  an  R.M.  Young  (Traverse  City,  Michigan) 
propeller- vane  anemometer  mounted  5  m  above  the  sea  ice  and  sampled  at  10  Hz  (Andreas 
et  al.  1992). 

Figure  1 1  shows  5  minutes  of  wind  speed  data  collected  with  the  propeller- vane  on  24 
May  1992  at  about  3  a.m.  local  time  (Julian  day  145,  0500  GMT)  at  an  air  temperature  of 
-25°C.  Our  wavelet  analysis  shows  an  almost  linear  trend  in  the  series  (Figure  1 1,  middle 
panel);  |lIo  =  7.93  m/s,  |Ii  =  -2.881  x  10“^  m/s^,  and  )i2  =  8.100  x  10“^  m/s^.  With  a 
nominal  resolution  for  the  propeller-vane  wind  speed  of  0.05  m/s  (Andreas  and  Claffey 
1995),  Eq.  (5.10)  requires  that  we  remove  this  trend  since  D2  =  0.30  m/s.  The  detrended 
series  (Figure  11,  bottom  panel)  has  a  standard  deviation  of  0.73  m/s,  while  the  original 
series  has  a  standard  deviation  of  0.78  m/s. 

Figure  12  shows  how  detrending  improves  the  behavior  of  the  autocorrelation  function 
for  the  series  in  Figure  11.  The  original  series  has  an  integral  scale  of  4.8  s;  the  detrended 
series,  2.9  s. 

Finally,  we  close  this  section  with  an  example  of  potential  problems.  Figure  13  shows 
another  series  of  ISW  propeller-vane  wind  speed  data  collected  about  9  minutes  after  the 
series  in  Figure  11.  Our  wavelet  calculations  yield  polynomial  trend  coefficients  of 
Po  =  7.91  m/s,  |i|  =  1.41 1  X  10^  m/s^,  and  \i2  =  -7.068  x  10“^  m/s^.  These  values  yield 
D2  =  0.18  m/s;  Eq.  (5.10)  thus  again  requires  detrending  for  a  sensor  resolution  of  0.05  m/s. 
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Fig.  11.  Three  hundred  seconds  of  propeller- vane  anemometer 
data  collected  at  -25°C  on  24  May  1992  on  Ice  Station  Weddell 
(top),  the  quadratic  trend  that  we  computed  and  removed 
(middle),  and  the  detrended  series  (bottom). 
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Fig.  12.  Autocorrelation  functions  computed  for  the  original  and 
detrended  series  in  Fig.  11.  Integral  scales  for  the  original  and 
detrended  series  are  4.8  and  2.9  s,  respectively. 


But  the  detrended  series  has  a  standard  deviation  of  0.81  m/s,  while  the  original  series 
has  a  standard  deviation  of  only  0.80  m/s.  Figure  14  shows  that  detrending  also  degrades 
the  autocorrelation  function.  The  original  series  has  an  integral  scale  of  4.7  s,  while,  for  the 
detrended  series,  it  is  5.1  s.  From  Eq.  (5.8)  it  is  possible  to  prove  that  the  population 
variance  of  the  turbulence  alone  (Og)  is  never  greater  than  the  population  variance  of  the 
turbulence  signal  riding  on  a  trend  (<7- )•  That  is,  the  bracketed  quantity  on  the  right-hand 
side  of  Eq.  (5.8)  is  never  negative.  Our  computing  sample  variances  that  violate  this 
constraint  resulted  because  Po,  Pi,  and  P2  have  random  uncertainties.  In  the  example  in 
Figures  13  and  14,  detrending  introduced  more  variability  into  the  resulting  series  through 
these  uncertainties  than  the  trend  explained.  Consequently,  as  a  final  criterion  for  trend 
removal,  you  must  compare  the  variance  of  the  original  series  with  the  variance  of  the 
detrended  series.  The  latter  must  always  be  less,  or  the  detrending  is  serving  the  opposite 
purpose  for  which  it  is  intended. 

7.  Conclusions 

We  have  described  a  new,  wavelet-based  method  for  detecting  linear  and  quadratic 
trends  in  turbulence  time  series.  Actually,  though,  there  is  nothing  about  our  analysis  that 
makes  it  specific  to  turbulence  series.  It  would  work  just  as  well  on  a  climatic  time  series  or 
a  series  of  Dow  Jones  averages. 


Fig.  14.  Autocorrelation  functions  computed  for  the  original  and 
detrended  series  in  Fig.  13.  Integral  scales  for  the  original  and 
detrended  series  are  4.7  and  5.1  s,  respectively. 


Fundamentally,  wavelet  trend  detection  works  because  the  inverted  Haar  wavelet  is  a 
first-difference  operator;  it  picks  out  the  first-order  coefficient  of  a  quadratic  trend.  In  turn, 
the  elephant  wavelet  is  a  second-difference  operator  and,  as  such,  picks  out  the  second- 
order  coefficient  of  the  trend  polynomial. 

Although  here  we  focused  on  polynomial  terms  only  up  to  second  order,  it  would  be 
easy  to  extend  our  method  to  detect  higher  order  trends  by  defining  new  (unnamed) 
wavelets.  As  we  have  shown,  the  wavelet  for  detecting  a  linear  trend  has  lobes  of-1  and  1 
with  a  multiplier  of  (2/L)^.  The  wavelet  for  detecting  a  second-order  trend  has  lobes  of  1, 
-2,  and  1  with  a  multiplier  of  (3/L)^/2.  By  extension,  the  wavelet  for  detecting  a  third-order 
trend  would  have  lobes  of-1,  3,  -3,  and  1  with  a  multiplier  of  (4/L)^/6.  And,  in  general,  the 
wavelet  to  detect  an  nth-order  trend  would  have  lobes  that  are  the  coefficients  of  an  nth- 
order,  alternating  binomial  series  with  multiplier  [(n  -i-  1)/L]"+Vn!. 

Wavelet-based  trend  detection  is  more  efficient  than  its  main  competition — least- 
squares  regression — in  the  sense  that  wavelet  detection  requires  N  fewer  operations  to 
quantify  a  linear  trend  and  more  than  3N  fewer  operations  to  quantify  a  quadratic  trend  in  a 
series  of  N  samples  (see  Table  1).  Least-squares  regression,  on  the  other  hand,  is  more 
accurate  than  wavelet  detection  if  the  mean-square  error  of  the  estimators  is  the  criterion. 
But  the  factor-of-two  advantage  in  the  accuracy  of  the  and  p.2  estimators  that  least- 
squares  regression  has  over  wavelet  detection  may  not  be  worth  the  factor-of-two  cost  in 
computer  operations. 
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A  trend-detection  scheme  is  not  complete  unless  it  provides  guidance  on  when  a  trend 
is  “significant”  and  therefore  must  be  removed.  We  base  our  criterion  for  trend  removal  on 
the  resolution,  r,  of  the  instrument  used  to  collect  the  data.  If  the  coefficients  of  the 
polynomial  trend  predict  that  the  variance  of  the  original  series  (with  trend)  and  the 
variance  of  the  actual  turbulence  series  (no  trend)  differ  by  at  least  r^,  the  trend  will 
significantly  bias  the  results  and  must  be  removed. 
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Appendix  A:  Mean-Square  Error  of  Wavelet  Trend  Detection 

Using  Eq.  (3.21)  as  the  definition  of  the  mean-square  error,  we  want  to  evaluate  here  the 
accuracies  of  |io»  and  A2  ia  the  mse  sense. 

This  analysis  is  easiest  if  we  start  with  Pa-  From  Eq.  (3.21), 

MSE[(l2]  =  E[((i2-nf].  (Al) 

From  Eq.  (3.5),  this  becomes 

MSE[A2]  = 

B^E  Jl^^^^dsdt  g(s)-2g^s+tj  + +  _  2g|^t+tj  +  g|^t4-— j  .  (A2) 

On  multiplying  the  integrand  out  and  defining  the  autocorrelation 

Cj(s,t)  -  E[g(s)g(t)],  (A3) 

we  convert  Eq.  (A2)  to 

MSE[|i2]  =  Cg(s,t)-2Cg[^s,t+|j  +  Cg(^s,t+^] 


f  L  ^ 

1  f  L 

f  h 

2L')  ^  ( 

'  2L  a 

s+— ,t 
V  3’  > 

|  +  4C,(s+- 

,t+— 

3) 

-2Cg(s+- 

s+ — ,t 
^  3  J 

f  2L  L 


+  CJs 


’  3  3 


If  the  statistics  of  g  are  stationary,  Cg(s,t)  is  an  even  function  only  of  the  difference  t 
between  s  and  t,  where 

T  =  t  -  s  .  (A5) 


That  is, 

Cg(s,t)  =  Cg(s,s-KT)  =  Cg(T)  =  Cg(-T).  (A6) 


Also,  because  of  this  symmetry,  the  double  integration  in  Eq.  (A4)  reduces  to  a  single 
integration  over  v,  for  example,  see  Papoulis  (1965,  p.  325)  or  Panofsky  and  Dutton  (1984, 
p.  62  ff.).  That  is. 


Jl^'-dsdt  2Lj^'-(l-i]dx. 


(A7) 
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Therefore,  Eq.  (A4)  becomes 
MSE[fl2]  = 


X 


6C,(x)-4C,(T+|]-4Cg(T 


(A8) 


Because  Cg(T)  approaches  zero  when  X  exceeds  the  integral  scale  3  [see  Eq.  (3.22)]  and 
because  we  assume  that  3  «  L,  only  the  6Cg(T)  term  contributes  to  this  integral.  As  a  short 
demonstration  of  this,  we  evaluate  two  of  the  terms  in  Eq.  (A8).  The  first  term  is 

=  63a|  {A9) 

from  Eq.  (3.22).  The  3t/L  term  yields  zero  because  where  Cg  is  nonzero  (for  x  <  3),  3t/L  ==  0; 
and  where  3t/L  is  large  (for  3t  ~  L  »  3),  Cg  is  zero. 

As  another  example,  look  at  the  second  term  in  Eq.  (A8); 


,L/3  ,  3t 

L  dx  1 - 

••o  I  L 


4CJ  x+ 


(AlO) 


where  we  have  made  the  change  of  variables  x'  =  x  +  — .  But  the  right  side  of  Eq.  (AlO)  is 
zero  because  3  «  L.  ^ 

By  similar  methods,  we  can  show  that  the  other  three  terms  in  Eq.  (A8)  also  integrate  to 
zero.  Therefore, 


MSE[^l2]  =  |LB^(63aJ) .  (All) 

Using  Eq.  (2.4)  for  B,  we  see 

MSE[|i2]  =  ■  (A12) 

Next,  we  evaluate  the  mse  of  pj; 


MSE[(i,]  =  e[((1,  -  (if  . 


From  Eq.  (3.12), 


(A13) 


MSE[iai]  =  A^EllJ^^^^dsdt 


g[s+|j-g(s) 


g|  t+-  |-g(t) 


(M-2"42)C^dt 


g|t+-|-g(t) 


+  L^E 


[(^2-^2^  • 


(A14) 


-2ALE 


The  third  term  is  just 

3rd  term  =  L^MSE[(i2]  =  ■ 

Using  Eq.  (A3),  (A6),  and  (A7),  we  can  show  that  the  first  term  in  Eq.  (A  14)  is 


Istterm  =  LA^ J ‘"'Tl -  — ir2C  (x)  -  CgfT-^l-  Cgfx  +  ^M.  (A16) 

^VLy  \  2  J  \  2  J_ 


Again,  because  Cg(x)  goes  to  zero  rapidly  when  X  exceeds  3,  and  because  3  «  L,  only  the 
first  term  in  Eq.  (A16)  contributes  to  the  integral.  We  find 


Istterm  =  LA^f23a3)  = 


lHl;  ^ 


From  Eq.  (3.5),  the  middle  term  in  Eq.  (A14)  is 
middle  term  = 


-2ABLE g(s)-2g(^s+tj+g|^s+^^  g(^t+yj-g(t)  .  (A18) 


From  Eq.  (A3),  this  becomes 


middle  term  =  -2ABL dsj^^^^  dt  Cg  s,t+Y  -  Cg(s,t)  -  2Cg|^s+^,t+y 


On  using  repeated  variable  transformations,  we  convert  Eq.  (A19)  to 


middle  term  =  -2ABL|jj^^^^  dsdt  Cg  s,t+t  -  Cg(s,t)  -  2Cg|^s+— ,t+— 


r  L  ^ 

^  r  2L 
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+  -  Cg[s,t+|j  -  2Cg(^s+t 


,  L  ^  f  2L  5h\  ^  f  2L  L 

+  2Cg  SH - ,tH - +  Cg  SH - ,tH - —  Cg  SH - ,tH - 

3  3)  H  3  6  H  3  3 


T  'ST^  /^T  T^  /'T  ST 

+  CglS  +  -,t+—  -  Cg  S  +  -,t  +  -  -  2Cg  S  +  -,t  +  — 

6  6j  6  n  2  6 


+  2Cgls+|,t+|]  +  Cg(s+^,t+f )  -  Cg(s+^,t+| 


.(A20) 


Invoking  Eq.  (A6)  and  (A7),  we  again  convert  the  integrations  in  s  and  t  to  a  single 
integration  in  v. 


middle  term  =  -2ABL^  —  f  dx  1 - 

I  3  ^0  ^  L 


r^,L/3,_r,  3x 


Cgh+f)  -  Cg(T)  -  2Cgfx+t 
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get 


From  Eq.  (A9),  we  expect  that  the  two  Cg(x)  terms  will  contribute  to  these  integrals.  We 


-Jo  dx 


V  L, 


C  (x)  =  -Sal 


(A22) 


and 


^fL/6  , 

3Jo  dx 


1-^ 

V  Ly 


Cg(x)  =  33a2. 


(A23) 


The  fifth  term  in  Eq.  (A28)  is 


Now  look  at  the  second  term  in  Eq.  (A28).  From  Eq.  (3.12),  we  see  that  this  is 
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2nd  term  =  -AE  dsg(s)J^‘''^  dt  -  g(t) 

+  LE[(ii2-H2)jJ'^^dsg(s)].  (A36) 

The  second  ternni  here  is  only  a  multiplication  factor  different  from  the  third  term  in  Eq. 
(A28),  which  we  just  showed  was  zero.  Hence, 

2nd  term  =  -Aj(,‘'ds/J''^  dt  Cg[^s,t+|j  -  Cg(s,t)  .  (A37) 

Again  dividing  the  integration  over  s  into  two  segments,  transforming  variables,  and 
invoking  Eq.  (A6)  and  (A7),  we  convert  Eq.  (A37)  to 

2nd  term  =  -ALJ^*-'"  -  C,(x)  +  C^{l)  -  C,(t-|]].(A38) 

This  also  integrates  to  zero; 

2nd  term  =  0  .  (A39) 

Finally,  we  tackle  the  fourth  term  in  Eq.  (A28).  From  Eq.  (3.12),  this  is 

4th  term  =  ^AL^E  (^i2-^t2)Jy^  *  g|^t+tj-g(t)  -  tL‘‘E[(ji2-H2f  ]•  (A40) 

We  recognize  MSE[p2]  in  the  second  term  on  the  right-hand  side  of  this.  Thus,  using  Eq. 
(3.5), 

4th  term  =  iABL3Ej^^'^ds|^‘''^dt  g(s)-2g|^s+tj+g|^s+^j  g(^t+tj-g(t) 

-  tL‘'MSE[A2].  (A41) 

But  we  already  evaluated  the  double  integral  in  the  first  term  of  Eq,  (A41)  starting  at 
Eq.  (A18).  Consequently, 
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4th  term  =  iABL3(L3aJ)  -  .  (A42) 

Finally,  putting  Eq.  (A29-A31),  (A35),  (A39),  and  (A42)  together  in  Eq.  (A28)  gives 
MSE[^o]  =1^2  +  0  +  0  -  225  +^+81  =  21.25(^|-jag  .  (A43) 
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Appendix  B:  Mean-Square  Error  of  Least-Squares  Trend  Detection 

The  basis  of  least-squares  trend  detection  is  minimizing  the  total  squared  difference 
between  the  measured  values  and  the  fitted  polynomial  trend.  In  our  application,  g(t)  is  the 
measured  series,  and 

=  Iq  dt[g(t)  -  po^LS  ”  Ai,Lst  “  A2,Lst^] 

is  the  total  squared  difference  between  the  measurements  and  the  predicted  trend. 

Minimizing  yields  the  least-squares  estimators,  Po.LS’  Ai,lS’  Az.ls-  require 

—  =  0  =  dt[g(t)  -  Po,LS  “  Al.Lst  “  A2,Lst^]’ 

—  =  0  =  -2j  dtt[g(t)  -  Po^LS  “  Ai.Lst  “  A2,Lst^]’  (B2b) 

dPiLS  ^ 

2 

—  1=  0  =  -2/  dtt^[g(t)  -  Po^LS  “  Al.LSt  “  A2,LSt^]-  (B2c) 

Henceforth,  we  drop  the  subscript  LS  since  this  whole  appendix  treats  only  the  least- 
squares  estimators. 

Equations  (B2)  integrate  easily  to  yield 

K  ,  K  ,2 

1  .  K  ^  1  .  . 

-Po  +  -PiL  -H  -P2L^ 

-Po  +  -PiL  +  -P2L' 


(B3a) 

(B3b) 

0 

(B3c) 

This  set  has  an  obvious  matrix  interpretation; 
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2  3  4 
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\  y 
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^C‘8W* 
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(B4) 


We  can  invert  the  3  x  3  matrix  readily  to  obtain  solutions  for  ilg,  (li,  and  {12  (e.g., 
Bevington  1969,  p.  134  ff.); 


|io 


lOt^^ 

3lS 


g(t)dt, 


(B5) 


A2 


—  -  — 

l"H  3L 


+  ^ji(0dt 
f)g(t)dt. 


(B6) 

(B7) 


These  then  are  the  estimates  for  the  polynomial  trend  coefficients  that  least-squ£ires  regres¬ 
sion  would  yield.  Let  us  investigate  their  statistical  properties. 

Look  first  at  p-Q.  In  Eq.  (B5),  substitute  Eq.  (3.1)  for  g(t); 


Ao  “  dt|^l  -  “  +  +  M-o  +  +  ^21^]- 

This  readily  integrates  to 

9  fL  4t  lOt^^  ,  . 

‘‘0  =  “  r  ^ 

Because  E[g(t)]  =  0,  we  see  that 


E[Ao]  =  ^lo  ■ 


(BIO) 
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Make  the  change  of  variables 


T  =  t 


This  requires 


s  , 


w  =  -(t  +  s). 


Ill 

IJ^^dsdt  dTj^^  dw. 


(B18) 

(B19) 

(B20) 


Also,  with  Eq.  (A3)  and  (A6),  Eq.  (B17)  transforms  to 
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On  integrating  over  w  in  Eq.  (B21),  we  obtain 
MSE[m.o]  =  ^j_''^^dTCg(T)i 


W^ 

- - 1 - 
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(B21) 
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We  realize  now  that  when  we  integrate  over  x,  any  term  in  Eq.  (B22)  containing  a  x  will 
integrate  to  zero  because  3  «  L  and  Cg(x)  rapidly  approaches  zero  when  x  exceeds  3. 
Thus,  in  effect,  we  can  take  the  integration  limits  for  w  to  be  0  and  L.  With  this  simplifica¬ 
tion, 

MSE[ii„]  =  J  (T)[l  -  4  + 1  -  f  +  ^]  =  dtC,  (t)  .  (B23) 

Hence,  from  Eq.  (3.22) 


MSE[Ao]  =  • 


(B24) 
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From  Eq.  (B12), 


MSE[|1i]  =  ^  E 


36 


16s  5s^ 


16t  5t^ 


3L 


g(s)g(t) 


.  (B25) 


On  multiplying  this  out,  using  the  variable  transformations  given  in  Eq.  (B18-B20),  and 
invoking  Eq.  (A3),  we  obtain 

MSE[|1,]  =  dwCg(T) 
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32w  346 83t^  160w^  40 wt^  25  T  a 

3L  9L^  ISL^  3L^  3L^  2  16/ 


(B26) 


Integrating  over  w  yields 


MSE[fL,]  = 
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But,  as  with  Eq.  (B22),  we  recognize  that  any  term  in  the  w-x  polynomial  containing  a  X 
will  integrate  to  zero  when  we  do  the  next  integration  over  x.  Consequently,  we  can  replace 
the  limits 


ma„dL-m 

2  2 


with  0  and  L.  Equation  (B27)  then  becomes 


I  2592  ^  16  346  40  _ 

MSE[n,]  =  _|^dtCg(t)ll-y  +  — -y  +  5 


384^3^  2 


Finally,  we  evaluate  the  mse  of  jl2 .  From  Eq.  (B 14), 
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Two  dimensional  spectral  estimate  of  nonstationary  processes  is  stud¬ 
ied.  Currently  nonstationary  phenomena  are  usualy  modeled  and 
anailysed  as  if  they  were  stationary.  The  spectrum  of  a  nonstation¬ 
ary  process  is  two  dimensional  while  that  of  a  stationary  process  is 
one  dimensional.  The  usual  one  dimensional  spectrum  and  the  more 
complete  two  dimesional  spectrum  of  some  nonstationary  processes 
are  compared.  In  addition,  a  random  phase  shift  which  as  the  therory 
shows  should  produce  a  stationay  process,  is  introduced  to  our  nonsta- 
tionay  process  and  the  resulting  stationary  process  is  examined.  These 
results  are  also  compared  with  the  method  described  above. 


1  Introduction 

In  recent  years  interest  has  grown  in  nonstationary  processes  for  modeling  pln-sical 
phenomena.  This  is  due  to  the  fact  that  most  physical  phenomena  represent  nonsta¬ 
tionary  behavior.  Currently  these  phenomena  are  generally  modeled  as  if  they  were 
stationary.  This  is  because  all  aspects  of  the  theory  for  representation,  are  quite  w^ell 
developed^ .  However,  if  the.se  phnomena  are  to  be  better  understood,  they  must  be 
modeled  as  nonstationar^^  One  aspect  which  distinguishes  a  nonstationary  process 
from  a  stationary  process  is  their  power  spectrum.  Nonstationary  processes  have  a 
two  dimensional  power  spectrum  while  stationary  processes  have  a  one  dimensional 
power  spectrum.  For  nonstationary  processes  the  theory  is  not  complete,  and  some 
basic  questions  such  as  interpretation  of  their  power  spectrum  still  need  to  be  inves¬ 
tigated.  In  this  paper,  a  two  dimensional  estimate  of  the  power  spectral  density  of  a 
nonstationary  processes  will  be  discussed.  The  estimate  then  will  be  applied  to  some 
helicopter  noise  data  which  is  clearly  nonstationary.  The  results  of  this  nonstationary 
analysis  history  will  also  be  applied  to  our  helicopter  noise  data. 

The  paper  is  organized  as  follows:  Section  2  discusses  the  necessary  background 
material,  including  the  definition  of  three  useful  classes  of  nonstationary  processes. 


75 


76 


Section  3  discusses  a  method  for  determining  the  period  of  a  periodically  correlated 
process  when  the  period  is  not  known.  The  period  of  such  a  process  must  be  known 
before  one  can  attempt  to  produce  a  good  estimate  of  the  power  spectrum.  In  Section 
4,  a  method  developed  iif  is  used  to  obtain  the  power  spectrum  estimate.  The  bias 
and  variability  of  the  estimate  are  also  discussed.  Finally,  in  Section  5,  the  current 
method  for  analyzing  nonstationary  data  is  compared  to  our  two  dimensional  spectral 
analysis  developed  in  this  paper.  Conclusions  are  then  made  regarding  the  comparison 
of  these  methods. 

2  Preliminaries  and  Background 

A  stochastic  process  or  random  process  is  a  family  of  random  variables  {A'(A)  :  A  € 
A},  where  A  is  the  index  set  of  the  parameter  A.  Usually  the  index  set  is  either  the 
set  Z  of  all  integers  in  which  case  the  process  is  called  discrete  or  the  set  R  of  real 
numbers  in  which  case  the  process  is  called  continuous.  Throughout  this  paper  the 
procedures  developed  are  for  discrete  processes.  This  is  because  the  actual  data  used 
to  produce  the  spectral  estimate  is  discrete,  a  fact  which  is  due  to  the  necessity  of 
sampling. 

A  stochastic  process  A" (t)  is  called  stationary  if  its  co\^riance  function 

J?(<,s)  =  £[X(t)X(s)],  t,s€Z  (1) 


satisfies  the  relation 

s)  =  R{t  +  1,5  -t- 1), 

for  all  t,s  £  Z.  A  stochastic  process  X(t)  is  called  periodically  correlated  (PC,  in 
short)  of  period  T  if  its  covariance  function  satisfies 

R{t,  s)  —  R{t  +  T,  5  +  T), 

for  all  t,s  Z,  or  if  its  covariance  function  written  in  the  form 
rr{t)  =  R{t  +  Tj)  =  E[X{t  +  T)X(t)] 

is  periodic  in  t  with  period  T.  (  A  PC  process  with  period  T  =  1  is  simply  stationary), 
since  rr(t)  is  periodic  in  t  with  period  T,  one  can  write  (for  this  and  other  results  on 
PC  processes  see'*  ) : 

rr{t)  =  Rk{r)  exp  .  (2) 

For  convinience  we  usually  extend  the  definition  of  functions  Rkir),  fc  =  0, 1, ...,  T—  1, 
to  all  integers  by  letting  Rk{r)  =  Rk+riT).  It  is  well-known  that  each  Rk{r)  has  the 
representation 

Jo 
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where  each  is  a  measure  on  (0,27r].  One  can  then  write 
r^(t)  = 

with  the  two  dimensional  power  spectrum  F(., .)  of  the  process  X{t)  being 

(B-a  is  the  set  of  all  l>a’s  with  b  in  B).  In  other  words,  spectrum  F(., .)  is  concentrated 
on  2T— 1  straight  line  segments  A— ^  =  27rk/T,  k  =  1— T,  1,  contained  inside  the 

square  =  (0,27r]  x  (0,27r],  with  the  measure  Fk{.)  representing  the  mass  of  F{,,.) 
on  the  k-th  segment.  From  the  above  discussion,  taking  T  =  1,  we  see  that  the  power 
spectrum  of  any  stationary  process  is  concentrated  on  the  diagonal  of  the  square  T^. 
Thus  we  recover  the  well-known  one-dimentional  spectrum  of  the  stationary  process, 
considered  now  as  a  two-dimensional  spectrum.  PC  processes  have  been  of  interest 
because  of  their  relatively  simple  structure  which  stems  from  their  very  clo.se  tie  with 
stationary  processes.  In  fact,  it  is  well-known  that;  if  X{f)  is  a  PC  process  with 
period  T.  Then  the  T- variate  process  formed  from  consequent  T  blocks  of  X{t)  is 
a  multivariate  stationary  process.  This  has  made  it  possible  to  employ  properties 
of  this  associated  stationary  process  and  develop  some  structural  properties  of  the 
PC  procesj^* .  However,  their  close  tie  to  stationary  processes  limits  their  modeling 
potential  for  nonstationary  data.  So,  it  is  desirable  to  consider  some  richer  classes  of 
nonstationary  processes.  We  now  discuss  one  such  class  of  nonstationary  processes 
which  was  introduced  by  Hardin  and  Miame^. 

Definition  1  A  zero  mean  stochastic  process  X{t)  is  called  covariance  autocorrelated 
(CAR,  in  short)  if  there  exists  some  finite  set  {aj  :  j  =  1,2, ...,  k]  of  scalars  such  that 

K 

^)  =  I]  +  7, 5  +i),  for  all  t,s  e  Z.  (3) 

Here  are  some  examples  of  CAR  processes 

1.  Recall  that  a  process  is  stationary  if  its  covariance  function  satisfies 

=  1,.9-f  1) 

and  hence  it  is  clearly  CAR  .  In  fact,  in  (1)  one  can  take  oi  to  be  1  and  the  rest 
of  a^f^s  to  be  zero. 

2.  Similarly  one  can  check  that  a  PC  process  with  period  T  is  CAR,  by  taking 
k  =  T  ,  Ot  ~  I  and  the  rest  of  zero. 
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3.  Let  X(t)  be  a  stationary  process  and  let  a  be  a  real  number  other  than  1,  then 
one  can  see  that  the  following  two  processes 

Y{f)  =  tX{t)  and  Zit)  =  a^X{t) 

are  both  CAR.  For  the  first  one,  take  fc  =  3,  oi  =  —3,  a2  =  —3,  and  03  =  1. 
Note  that  the  CAR  processes  in  example  3  are  neither  stationary  nor  PC.  This 
is  because,  in  each  of  these  cases,  the  power  of  the  process  approaches  infinity 
for  large  t’s,  while  for  stationary  and  PC  processes  this  quantity  stays  bounded. 

4.  Taking  X(t)  again  to  be  a  stationary  process  one  can  prove  that  the  modulated 
processes 

U{t)  =  {a  +  bf)tX{t)  and  V{t)  =  tatX{t) 
are  also  CAR  (  we  omit  the  proof  ). 

Another  important  class  of  processes  is  that  of  harmonizable  processes,  which  we 
briefly  discuss  below. 

Definition  2  A  stochastic  process  X{t)  is  called  harmonizable  if  its  covariance  func¬ 
tion  has  the  following  harmonic  ( or  Fourier  )  representation 

R{i,  s)  =  If  A), 

where  the  measure  F  defined  on  the  square  is  its  spectral  measure. 

In  other  worlds,  we  say  a  random  process  X{t)  is  harmonizable  if  the  double 
Fourier  transform  of  its  covriance  function 

/(A, fl)  =  i  /"  dt  r  dsRit, sje-’*"-"'-',  (4) 

47r  7—00  7-00 

exist.  If  this  is  the  case,  then  /(., .)  is  called  the  power  spectral  density  of  the  process 
X{t).  For  a  stationary  process,  replacing  R(t,s)  by  r(t-s)  and  substituting  r  for  t-s, 
(4)  simplifies  to 

/(A.e)  =  /(^)'5(A-«). 

where  <5(.)  is  the  Dirac  delta  function,  and 

m  =  ^ 

This  results  in  a  one  dimensional  spectral  density. 

We  showed  that  any  stationary  process  is  harmonizable  and  its  (two  dimensional) 
spectral  measure  is  concentrated  on  the  diagonal  D  =  {(A,^)  :  A  =  0}  of  the  square 
T^.  One  can  similarly  show  that  any  PC  process  with  period  T  is  harmonizable  with 
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its  spectral  measure  being  concentrated  on  2T-1  equidistant  line  segments  parallel  to 
the  diagonal  of  that  square,  i.e.  on  the  line  segments 

=  {(^,  A)  :  ^  =  A  +  27r/A:},  =  -T  +  1,  ...,T  -  1 

Now  we  present  the  following  theorem  which  is  essential  for  any  application  of  CAR 
processes  (  for  its  proof  and  other  properties  of  CAR  processes  the  reader  is  refered 
tc^). 

Theorem  1  The  power  spectrum  of  any  harmonizable  CAR  process  is  concentrated 
on  a  finite  number  of  straigh  line  segments  parallel  (but  not  necessarly  equidistant 
from)  the  main  diagonal  of  the  square  T^. 


3  Estimating  period  of  a  PC  process 


In  this  section,  a  method  will  be  examined  for  determining  the  period  of  a  PC  process 
from  its  given  data.  The  method  entails  finding  the  lines  of  spectral  support  for  the 
PC  process  and  then  to  use  the  distance  between  these  lines  to  determine  the  period. 

The  discrete  data  we  analyze  here  was  taken  from  an  acoustic  time  history  pro¬ 
duced  by  a  helicopter  that  was  fixed  with  respect  to  the  observer.  The  passing  blades 
from  an  isolated  main  rotor  or  isolated  tail  rotor  will  produce  a  periodic  sound  pres¬ 
sure  time  history.  This  time  history  will  have  the  same  period  as  that  of  the  passing 
blades  which  produced  them.  The  history  produced  by  the  main  or  tail  rotor  rotating 
alone  represents  a  PC  process  which  covariance  function  has  the  same  period  as  the 
time  history.  Recall  that  covariance  function  of  a  PC  process  is  periodic  with  a  period 
T  and  has  an  associated  frequency  f.  It  is  this  period  that  we  want  to  find. 

Hur(f  developed  a  useful  technique  for  determining  the  spectral  support  of  a 
nonstationajy  process.  The  technique  consist  of  first  producing  the  discrete  Fourier 
transform  Xk  form  the  sampled  data.  Then  products  XpXq  of  these  transforms 
are  obtained  and  plotted  in  the  {p,q)  plane.  Subsets  of  these  products  are  then 
summed  along  the  diagonal  and  normalized  with  respect  to  the  main  diagonal.  That 
is,  a  spectral  coherence  is  produced  at  coordinates  (p,q)  where  p  and  q  correspond 
to  the  2pfNAt  and  2q/NAt  frequencies  respectively  (  the  time  between  subsequent 
samples) .  The  coherence  is  produced  by 


I  r{p,q,M)  P= 


E 


.^/-l 

m=0 


I  Y  Y  |2 

I  2^Tn=0  I _ 

I  P  I  I 


2’ 


This  coherence  is  used  to  determine  which  points  over  the  array  being  considered 
have  significant  values.  This  is  determined  by  choosing  a  threshold  value  and  plotting 
points  (p,q)  for  which  the  coherence  exceeds  the  threshold.  If  the  process  is  PC,  the 
ploted  points  (p,q)  should  produce  a  graph  (  see  for  example  figure  3  )  of  dots  along 
lines  parallel  to  the  diagonal.  The  separation  between  these  parallel  lines  can  be  used 
to  see  whether  the  process  is  PC  or  not,  and  to  find  the  period  of  our  process. 
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This  technique  is  based  on  the  theoretical  result  discussed  in  Section  1  which  says 
that  the  spectral  support  for  a  PC  process  with  period  T  is  on  2T-1  equidistant  line 
segments  parallel  to  the  main  diagonal  of  the  square  T^.  That  is  on  the  line  segments 

Dk  =  {{\.0):\  =  e  +  27rfc/T},  A:  =  0, 1, (T  -  1). 

Taking  the  case  with  k=l,  the  spacing  of  these  lines  is  given  by 

27r 

p-q  =  Y 

letting  T  =  7?  At,  produces 

27r 

The  difference  p-q  is  found  form  the  graph.  Thus  the  number  n  of  time  inter\^als 
St  needed  to  produce  the  period  of  the  process  can  be  calculated  directly.  After  the 
period  has  been  found,  our  next  task  is  to  estimate  the  power  spectral  density. 

4  Power  Spectral  Estimation 

A  random  variable  A  which  is  an  estimate  for  an  unknown  parameter  A  is  called 
unbiased  if 


^(A)  =  A, 

In  addition,  an  estimate  is  of  intrest  if  its  uncertainity  is  as  small  as  possible.  The 
uncertainty  of  an  estimate  is  measured  by  its  standard  deviation.  That  is, 

a  =  (E[A  - 

Therefore,  the  usual  requirements  are  that  the  estimate  be  unbiased  and  have  the 
smallest  possible  standard  diviation  a. 

For  stationary  processes  there  are  two  well  known  techniques  for  spectral  esti¬ 
mation  which  are  the  Blackman-Tukey  and  the  finite  Fourier  transform  techniques. 
Here,  to  produce  an  estimate  for  the  power  spectral  density,  we  have  chosen  the  finite 
Fourier  transform  technique.  This  method  consist  of  taking  the  discrete  Fourier  trans¬ 
form  of  sampled  data  and  using  the  transformed  data  to  produce  a  spectral  estimate. 
Consider  a  stationary  process  X(i)  of  which  the  (generalized)  Fourier  transform  is 
given  by 

^(A)  =  f  ‘dt. 

ZTT  J—oo 

So,  its  covariance  function  is  given  as 

£;[A'(A)A'(9)]  =  ^  r  dt  r  dsR{t  - 

47r'^  J-oo  J-OO 
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Changing  variables  {t  +  s)  /2  and  s  —  t  to  t  and  r,  respectively,  produces 

£[X(A)A-(e)]  =  i  driJ(r)e'^  -  S). 

Therefore, 

£'[X(A)p  =  /(A) 

In  practical  situations  only  a  single  sample  function  of  finite  length  of  a  random 
process  X{t)  is  available.  And  based  on  the  relationship  just  obtained,  a  class  of 
power  spectral  estimates 

m=w,\x,{x)  p 

is  introduced  where 

X{\)  =  -^f^^n{t)X(t)e'»dt  (5) 

is  the  Fourier  transform  of  the  data  as  seen  through  a  data  window  n(t).  This  data 
window  is  a  real  valued  function  that  is  zero  for  t  <  Oand  t  >  T,  so  that  unavailable 
data  are  not  required.  And  Wg  is  a  correction  factor,  due  to  the  presence  of  the 
window.  The  estimate  /(A)  for  a  fixed  A  is  a  random  variable  with  mean 


Furthermore, 

-  1  fOO  W  roo 

^[/(^)]  ^  +  T/2)n{t  -  r/2)7?(r)e“'^^ 

with  t  —  r/2  and  t  +  r/2  being  substitued  for  s  and  t,  respectively.  Here 

^4'^)  —  TT^  /  +  'r/2)n(t  —  Tl2)dt 

ZTT  J—oo 

is  a  lag  window  satisfying  the  following  conditions: 

1.  «(0)  =  1,  for  preserving  power 

2.  u{t)  =  u{—t)  which  makes  /(A)  real 

3.  u{t)  =  0  for  I  r  |>  T. 

For  the  first  condition  to  hold  we  arrive  at  the  requirment, 

W  = _ — _ 
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Therefore,  the  estimate  becomes 


_  Stt  I  X{u)  I 


The  second  condition,  namely  u{t)  being  even,  is  obviously  satisfied.  The  third 
condition  is  also  satisfied  since  u{r)  is  the  convolution  of  two  data  windows  that  are 
only  nonzero  in  the  interv^al  (0,T). 

The  estimate  developed  above  is  equivalent  in  expectation  to  a  class  of  estimates 
developed  by  Blackman  and  Tukey,  in  which 

fW  =  ^  /  u{T)R{r)e-^^^dT. 

ZTT  J-oo 


Here  R{r)  is  an  estimate  of  the  covariance  function  of  the  process  and  M(r)  is  a  lag 
window  as  described  above.  The  expectation  of  the  estimate  is 

£[/»]  =  ^ 

This  mean  spectral  estimate  can  be  shown  to  equal  the  convolution  of  the  actual 
spectral  density  with  a  ”  spectral  window”  which  is  merely  the  Fourier  transform  of 
the  lag  window.  Since  the  covariance  is  an  even  function,  one  can  see  that 

£:[/](A)]  ==  —  /  u{T)R{T)cosrXdT  =  /(A)  -  ;r-  /  cosrXdr 
27r  J—oo  27r  Jt 

for  the  case  of  a  ’’boxcar”  lag  window,  u{r)  —  1.  Therefore,  the  estimate  for  the 
power  spectral  density  is  biased,  but  it  becomes  unbiased  as  T  ->  oo. 

In  a  similiar  fashion  to  the  fisrt  technique,  we  suggest  the  following  procedure  for 
estimating  the  power  spectral  density  of  nonstationary  processes.  The  power  spectral 
density  for  a  nonstationary  process  can  be  written  as 

/(A, «)  =  7^  /  ds  dtR{t,  =  E\X{X)X(9)], 

47r  J-oo  J-oo 

where  ^ 

X(A)  =  X{t}e-'^‘dt 

ZTT  J—oo 

Here  /(A,  S)  can  not  be  simplified  to  /(A)  since  the  covariance  function  i?(f ,  s)  depends 
on  both  variables  t  and  s.  Therefore,  form  a  sample  function  of  length  T  of  a_nonsta- 
tionary  process  X(t),  a  class  of  power  spectral  estimates  /(A,0)  =  l'VsA'(A)A’'(^)  can 
be  introduced  when  A'(A)  is  exactly  as  introduced  for  the  stationary  case  in  (6).  The 
mean  of  the  estimate  is 

B[/(A,ff)]  ♦  »  *  »*  =  ^  £^dt  j^^dsn(t)n{s)R{t,s)e-d^'-^’‘\ 
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The  above  analysis  for  the  estimate  being  bias  when  A  =  0  in  the  stationary  case  can 
be  seen  to  hold  when  A  is  not  equal  to  9.  It  can  be  shown  that  the  spectral  estimate  of 
the  finite  Fourier  transform,  in  the  stationary  case,  is  essentially  a  chi-square  random 
variable  with  two  degrees  of  freedom.  Therefore,  variability  of  the  estimate  can  be 
reduced  by  breaking  the  estimate  into  Nb  blocks  of  length  Tb  such  that  NbTb  ~  T. 

A  spectral  estimate/j(A),  for  j  =  l,2,...,iVB  will  then  be  taken  over  each  block. 
If  the  blocks  are  assumed  to  be  independent,  the  average  of  the  block  estimates. 


1 

is  essentially  a  chi-square  random  variable  with  K  =  2Nb  degrees  of  freedom.  The 
variability  of  such  estimate  is  intimately  linked  to  its  resolution.  Full  resolution  refers 
to  two  sinosoids  of  the  same  amplitude  being  completely  distinguishable  when  viewed 
through  the  spectral  window  function  in  the  frequency  domain.  For  the  finite  Fourier 
transform  technique,  full  resolution  requires  that  frequencies  be  roughly  separated  by 

AA  =  ^  or  A/  = 

T  T 

Now,  by  breaking  the  data  into  Tb  blocks  the  effective  length  has  changed  from  T  to 
Tb-  Thus,  the  bandwidth  of  the  estimate  decreases  to 


A/  = 


_L 

Tb 


Since,  K  =  2Nb  =  we  have  K  =  2TAf  which  shows  the  treadeoff  between 
variability  and  frequency  resolution.  The  estimate 


/{A)  =  W,X{X)X{X) 


of  the  stationary  case  was  a  chi-square  random  variable  but  the  estimate 

f{X-9)=W,X{X)X{0) 

in  the  case  of  a  PC  process  was  not  so.  However,  the  latter  estimate  similarly  reduces 
in  variability,  when  the  process  is  blocked  in  integer  multipes  of  the  period,  i.e. 
Tb  =  nT,  T  being  the  period. 


5  Spectral  Analysis  of  Helicopter  Noise 

We  will  first  examine  the  correlation  of  the  acoustic  pressure  time  history  X(t)  from 
an  isolated  helicopter  rotor.  X(t)  is  a  discrete  process  representing  the  time  history 
sampled  at  discrete  intervals.  It  is  this  sample  data  which  is  used  to  produce  our 
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spectral  estimate.  Due  to  the  rotating  nature  of  the  helicopter  blades  we  can  assume 
that  the  time  histroy  is  periodically  correlated.  One  might  further  assume  that  this 
time  history  is  actually  doubly  periodically  correlated  (DPC).  If  this  was  the  case  , 
then  its  covariance  function  R(t,s)  will  be  periodic  in  both  t  and  s.  The  correlation 
function  being  periodic  in  t  can  then  bo  written  as 

(6) 

A-=0 

for  all  s  G  Z.  Now  since  R(s,t)  is  also  periodic  in  t,  i.e.  +  T)  =  R(s,f},  we  can 

write 

^  +r)e^,  /or  all  s,t  €  Z. 

k=0  Jt=0 

For  each  fixed  t  we  have 

0  =  X]  +  T)  —  Rfc(<)]e“^ ,  f  oralis  €  Z. 
k=Q 

which  implies  that 

Rk{t  +  T)  =  nk{t),  for  all  k  G  Z. 

Now,  since  t  was  arbitrary,  Rk{t)  is  periodic  for  each  k,  and  we  can  write 

^k{t)  =  E  Okje^ 

Substituting  this  into  (6)  and  simplifying  the  result  we  get 

T— 1 T— 1 

^(^’0  =  X  X  ^  (') 

k=0  j=o 

This  means  that  the  spectrum  of  a  DPC  processes  is  supported  on  points  (see 
figure  1) 


(27rit/T, 27rj/T)  :  j,  fc  =  0, 1,  ...,T  -  1 


(8) 
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However  as  we  will  show  later  in  figure  3  the  spectral  support  of  our  data  is  not 
on  lattice  points  and  hence  our  process  is  not  DPC. 


215' 


2TI 


Figure  1.  Spectral  Support  for  DPC  Processes 

Now  we  can  try  to  get  a  representation  for  the  process  X(t)  itself.  To  do  this  one 
should  first  verify  that  the  matrix  .4  =  [akj]  is  positive  definite  (we  omit  the  proof). 
This  done,  it  is  well-known  that  .4  =  [akj]  is  covariance  matrix  of  some  zero-mean 
Gaussian  random  vector  (yo-Ti, -..,17-1).  That  is  to  say 

Okj  =  E[YkYjl 

for  all  kj  =  0,1,...,T-1.  Therefore,  we  can  write  (7)  as 


;l-=o  j=o 


or 


A-0  j=0 


This  means  that  X{t)  can  be  taken  to  be 


T-l  ^ 

X(i)  =  Y.  for  all  t  G  Z. 


JtrrO 


This  in  particular  shows  X(t)=  X{t-t-T),  for  all  t,  which  means  the  process  X(t)  is 
periodic.  Now,  since  X(t)  is  periodic,  its  prediction  is  easy  once  its  period  is  known. 
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5. 1  Two-dimensional  analysis 

Our  aim  in  the  rest  of  this  section  is  to  use  the  results  discussed  above  to  analyze 
our  helicopter  noise.  The  standard  appoach  of  studying  the  data  is  to  treat  it  as  if 
it  were  stationary  and  hence  produce  and  study  its  one  dimensional  power  spectrum. 
However,  in  actuality  the  data  produced  from  the  isolated  tail  rotor  or  isolated  main 
rotor  is  not  stationary  but  PC.  So  it  takes  a  two-dimensional  power  spectrum  to 
adequately  study  such  data.  Furthermore,  the  combined  main  and  tail  rotor  data  is 
neither  PC  nor  stationary  and  it  is  likely  to  be  CAR.  Hence  this  data,  again,  requires 
a  two  dimensional  spectral  analysis. 

In  Section  3,  an  estimate  was  developed  for  a  nonstationary  process.  We  will 
use  this  estimate  to  produce  two-dimensional  spectral  estimates  for  the  data.  The 
technique  consist  of  estimating  the  spectrum  over  a  region  for  values  of  A  and  6  and 
case  of  the  isolated  main  and  tail  rotors  blocking  was  done  as  a  multiply  of  the  period 
of  process.  This  was  done  because  of  the  charcteristics  of  PC  processes.  To  insure 
that  these  points  are  actually  viewed,  a  length  of  data  which  is  a  multiple  of  the 
period  of  the  process  must  be  used  when  the  data  is  Fourier  transformed. 

The  data  used  to  produce  the  spectral  estimates  in  this  section  Sikorsky  Aircraft’s 
Basic  Model  Test  Rig  was  used,  and  the  details  of  which  can  be  found  ir?.  The  data 
was  taken  from  several  locations  around  the  helicopter  model  (figure  2).  analyzing 
data  from  several  different  positions,  the  noise  pattern  produced  could  be  used  to 
deterimine  how  the  noise  is  radiated  in  different  directions  and  to  identify  common 
characters. 


Figure  2.  Diagram  of  Microphone  Locations 
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In  producing  the  two  dimensional  spectrum  for  the  isolated  tail  rotor,  a  For¬ 
tran  program  was  written  to  produce  values  of  the  450  Hz  which  was  determined  by 
graphed  data  (  see  figure  3,  where  lines  for  the  shaft  frequency  are  also  present)  pro¬ 
duced  by  another  computer  program  based  on  our  discussion  in  Section  2.  The  test 
conditions.  The  passing  blade  produces  a  corresponding  periodic  acoustic  pressure 
time  history  with  a  period  of  450  Hz.  After  the  data  was  inputed  into  our  program 
for  producing  a  two-dimensional  spectrum,  a  set  of  values  were  given  in  the  output 
for  the  spectral  estimate  at  frequencies  A  and  0  going  from  zero  frequency  to  a  chosen 
upper  frequency  (for  us  4800  Hz).  We  viewed  the  data  and  decided  that  due  to  the 
background  noise  at  low  frequencies  the  scaling  of  the  graph  did  not  allow  enough 
of  the  power  at  the  fundamental  frequency  and  at  harmonics  of  the  tail  rotor  period 
to  be  shown.  Therefore,  we  decided  to  filter  the  data  using  a  Chebyshev  digital  fil¬ 
ter.  After  we  removed  all  frequencies  below  250  Hz,  the  graph  of  the  output  data 
revealed  more  detail  of  the  power  at  the  fundamental  frequencies  and  its  harmonics 
(see  Figures  4  and  5) 


Figure  3.  Spectral  Coherence  for  Isolated  Tail  Rotor  Data 
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Figure  4.  Tail  Rotor  Spectrum  Microphone  2 


Fl,  HZ  {X10)3 

Figure  5.  Tail  Rotor  Spectrum  Microphone  5 

The  graphs  show  most  of  the  power  for  the  tail  rotor  at  the  fundamental  frequency 
(450  Hz)  and  the  first  and  second  harmonics  (900  and  1350  Hz  respectively).  There 
are  components  otf  the  main  sinusoids  at  frequencies  A  and  0  where  X^9.  As  for  the 
data  sampled  from  the  isolated  main  rotor  time  history,  this  data  is  periodic  with  a 
frequency  of  95  Hz,  The  first  graphs  we  produced  from  this  data  showed  high  power 
levels  at  around  95  Hz.  to  filter  frequencies  below  250  Hz  out  of  this  data  also.  This 
resulted  in  the  second  harmonic  of  the  main  rotor  noise  (285  Hz)  being  dominant. 
At  microphone  5  the  l)ackground  tonal  noise  between  800  and  1300  Hz  as  reported 
in^^  is  apparent  (  see  figures  6  and  7  for  the  spectal  produced  from  data  taken  by 
microphones  2  and  5). 
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in 


Fl.  HZ  (XL0)3 


Figure  6.  Main  Rotor  Spectrum  Microphone  2 


in 


Fl.  H2  (X10)3 

Figure  7.  Main  Rotor  Spectrum  Microphone  5 

The  spectrum  of  the  combined  noise  was  of  interest  due  to  the  fact  that  if  noises  for 
the  isolated  main  and  tail  rotors  were  independent,  the  theory  suggest  that  the  spec¬ 
trum  for  the  combined  noise  is  just  the  sum  of  the  individual  spectra,  i.e.  spectrum 
of  combined  noise  must  be  the  same  as  combined  spectrum  of  noises.  The  spectrum  of 
the  combined  noise  which  is  the  noise  produced  when  both  the  main  and  tail  rotors 
are  working  was  of  interest  due  to  the  fact  that  if  these  noises  are  independent,  the 
theory  shows  that  the  spectrum  for  the  combined  noise  is  just  the  sum  of  the  spectra 
of  t|je  isolated  main  and  tail  rotors.  W'^e  decided  to  determine  the  dependence  of  tliese 
noises  by  comparing  the  spectrum  of  the  combined  noises  (see  figures  8  and  9)  with 
the  added  spectrum  of  the  noises  of  the  isolated  tail  rotor  and  main  rotor. 
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Fl.  HZ  {X10)3 

Figure  8.  Combined  Spectrum  Microphone  2 


Figure  9.  Combined  Spectrum  Microphone  5 

We  know  from  theory  that  if  these  noises  are  independent  then  the  added  spectrum 
(figures  10  and  11)  must  be  the  same  as  the  spectrum  of  the  combined  noise.  We 
then  compared  the  added  spectrum  of  noises  to  that  of  the  spectrum  of  the  combined 
noise.  And  since  the  spectrums  were  comparably  different,  we  concluded  that  these 
noised  must  be  dependent. 


91 


Figure  10.  Added  Spectrum  Microphone  2 


Figure  11.  Added  Spectrum  Microphone  5 


We  note  again  that  the  combined  noise  is  not  necessarly  PC,  because  it  is  the 
sum  of  two  PC  processes  with  incommensurate  periods.  It  is  probably  a  more  general 
nonstationary  process  like  CAR.  According  to  blocked  with  respect  to  one  period, 
we  chose  to  block  the  data  with  respect  to  the  period  of  the  tail  rotor  noise,  will 
naturally  result  in  the  estimated  spectrum  being  skewed  to  some  degree.  However, 
the  same  amount  of  skewing  occurs  in  the  estimated  spectrum  for  the  added  noise 
which  is  also  blocked  with  respect  to  the  tail  rotor  noise.  Therefore,  the  estimated 
spectrum  of  the  combined  and  added  noise  can  be  viewed  equally  with  respect  to 
theory. 
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5.2  AssumeA  Stationarity 

The  standard  method  for  handling  data  taken  from  helicopter  noise  is  to  treat  the  data 
as  if  it  were  stationary.  This  results  in  a  one  dimensional  spectrum  along  the  diagonal. 
Therefore,  the  spectrum  excludes  all  information  concerning  correlation  of  sinusoidal 
amplitudes  at  frequencies  where  6.  Although  these  neglected  studied.  An  eample 
of  a  sound  pressure  spectrum,  for  the  tail  rotor  produced  using  this  method,  is  given 
in  figures  12. 

The  graph  produced  contains  22  degrees  of  freedom  and  a  frequency  resolution  of 
49  Hz.  There  are  noticeable  peaks  are  the  fundamental  frequencies  and  subsequent 
harmonics.  In  the  graph  the  pressure  level  drops  off  very  rapidly.  Notable  peaks 
start  to  drop  off  after  3000  Hz.  Due  to  the  frequency  range  of  audible  sound  (2  to  20 
khz)  we  expect  to  find  high  pressure  levels  above  3000  Hz  also.  However,  this  is  not 
evident  with  this  technique.  This  leads  us  to  study  another  technique  which  utilizes 
a  random  shift  to  reduce  this  deficiency. 


Frequency,  Hz 

Figure  12.  Tail  Rotor  Sound  Pressure  Level  Microphone  3 
5.3  Random  Phase  Shift 

A  PC  process  will  become  stationary  by  applying  a  random  phase  shift  which  is 
uniformly  distributed  over  the  period  of  the  process.  This  stationary  process  can  then 
be  adequately  analyzed  by  the  standard  method  discussed  above.  This  new  stationary 
process  which  has  been  produced  is  no  longer  the  original  process.  And  thus,  no 
longer  contains  information  about  the  correlation  of  Fourier  components  at  different 
frequencies.  To  implement  the  technique,  we  take  a  length  of  data  and  break  it  into 
blocks.  Each  block  is  of  length  necessary  to  obtain  a  desired  frequency  resolution  plus 
the  period  of  the  underlying  process.  When  the  program  is  implemented,  a  random 
function  call  chooses  a  sample  index  uniformly  distributed  over  the  period  of  the 
process.  Starting  with  the  new  sample  values,  frequency  resolution.  This  is  done  for 
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each  block  of  data.  This  technique  results  in  wastage  of  some  sample  values  from  the 
start  of  the  record,  but  does  implements  the  random  phase  shift  while  maintaining 
the  desired  number  of  samples  per  block. 

The  sound  pressure  spectra  produced  from  this  shifted  data  can  be  compared 
to  that  of  the  original  (unshifted)  data.  After  viewing  the  graph  in  figure  13,  it  is 
apparent  smoothed  spectrum  as  if  a  moving  average  had  been  applied  to  the  spectrum. 
There  are  additional  peaks  in  the  spectrum  which  to  correspond  to  the  shaft  frequency. 
Since  the  shaft  frequency  one  fourth  of  the  blade  passage  frequency,  there  could  be 
three  peaks  between  the  harmonics  of  the  spectrum  for  the  tail  rotor  noise.  This  is 
evident  irf^  in  which  a  frequency  resolution  of  12  Hz  is  used.  This  resolution  is  fine 
enough  to  show  the  harmonics.  However,  our  graph  has  a  frequency  resolution  of  50 
Hz  which  is  not  fine  enough  to  completely  show  the  shaft  harmonics. 


Figure  13.  Tail  Rotor  Sound  Pressure  Level  with  Random  Shift 

This  technique  of  shifting  the  data  appears  to  be  useful  in  harmonics  of  the  trans¬ 
formed  data.  The  output  resulting  from  this  method  was  also  used  in  order  to  produce 
a  two  dimensional  spectral  estimate.  The  graph  for  stationary  process  from  a  PC  pro¬ 
cess. 
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Fl.  HZ  {Xi0)3 


Figure  14.  Tail  Rotor  Spectrum  with  Random  Shift  Microphone  3 


Conclusions 

We  discussed  a  method  for  producing  a  two  dimensional  spectral  than  considering 
nonstationary  data  as  being  stationary.  We  verified  that  applying  a  random  phase 
shift  to  the  data  results  in  a  more  useful  spectrum  for  viewing  higher  harmonics  than 
when  the  data  is  method  above  appear  to  drop  off  to  quickly.  For  a  further  study,  we 
feel  one  must  investigate  the  off  diagonal  values  of  the  spectral  estimate  as  to  how 
their  presence  can  be  \ised  to  characterize  the  process  itself. 
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ABSTRACT 

There  are  many  reasons  for  wanting  to  quantify  spatio-temporal  correlations  in 
geophysical  signals  over  a  large  range  of  scales  r.  Standard  approaches  use  either  the 
autocorrelation  function  {f{x+r)f{x))  or  the  related  2nd-order  structure  function 
([/(jc+r)-:/(jc)]^),  equivalently  (Wiener-Khinchin  theorem),  the  wavenumber  spectrum 
E(k),  with  k~  liras  the  scale  parameter.  These  are  all  2nd-order  statistics  however,  and 
they  do  not  discriminate  well  between  fields  with  sometimes  radically  different  spatial 
properties.  For  example,  in  seismic  signals  the  background  (possibly  instrumental)  noise 
can  be  modeled  as  white  whereas  the  interesting  events  are  more  like  Dirac  5-functions: 
both  components  are  5-correlated  in  the  sense  of  (J{x+r)f[x))  and  have  correspondingly 
flat  wavenumber  spectra  E(k)  =  constant.  In  another  instance,  temporal  fluctuations  of  air 
temperature  are  Brownian  motion-like,  with  f  (hence  E(k)  «  k~^), 

under  quiescent  meteorological  conditions;  unfortunately,  the  same  spectrum  is  assigned 
to  the  occurrence  of  a  quasi-discontinuity  marking  the  passage  of  a  front,  as  approximated 
by  a  Heaviside  step  function.  The  issue  at  hand  is  resolved  by  introducing  the  notion  of 
“intermittency,”  a  concept  borrowed  from  turbulence  theory  that  describes  the  occurrence 
of  bursts  of  intense  events;  statistically  speaking,  we  are  faced  with  the  break-down  of 
the  prevailing  Gaussian  paradigm  in  data  analysis.  To  characterize  intermittency,  some 
form  of  wavelet-type  time/frequency  (or  position/scale)  analysis  is  required.  Multifractal 
approaches  to  position/scale  analysis  are  particularly  easy  to  exploit:  they  use  higher- 
order  moments  as  a  simple  way  of  sorting  the  continuum  of  weak,  intermediate,  and 
strong  events,  and  we  look  for  power-law  regimes  in  the  resulting  scale-dependent 
statistical  quantities  at  all  orders.  The  two  main  categories  of  multifractal  analysis, 
gth-order  structure  functions  and  singularity  analysis,  are  surveyed  and  illustrated  with 
both  models  and  cloud-related  data  in  ID  and  2D.  We  address  in  detail  the  sampling  (or 
“ergodicity”)  problems  that  arise  as  soon  as  Gaussian  assumptions  are  relaxed  and  their 
relation  to  both  stationarity  and  intermittency  is  discussed.  Finally,  we  outline  how 
multiscaling  has  helped  to  further  the  theory  of  cloud-radiation  interaction,  as  applied  to 
the  forcing  of  the  climate  system  and  the  remote  sensing  of  cloud  properties. 


tAlso:  Science  Systems  and  Applications,  Inc.  (SSAI),  5900  Princess  Garden  Parkway,  Lanham,  Md. 
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1.  Introduction 
1.1  Background 

There  is  an  increasing  need  in  the  geophysical  community  for  statistical  analysis  of 
data.  This  need  is  traditionally  met  with  techniques  developed  within  entirely  different 
areas  of  research.  For  example,  first  and  second  order  statistics  — means,  variances  and 
covariances —  found  their  foremost  applications  in  psychometrics.  These  well-known 
quantities  are  the  parameters  of  the  most  general  multivariate  Gaussian  distribution. 
Their  counterparts  in  time-series  analysis  are  the  2-point  autocorrelation  function  and  the 
wavenumber  spectrum  which  proved  to  be  powerful  tools  for  solving  engineering 
problems  in  communications  and  signal  processing. 

Over  the  decades,  it  became  apparent  that  Gaussian  — and  otherwise  “thin-tailed” — 
statistics  were  ill-suited  to  describe  many  random  signals  that  reflect  the  variability  of 
geophysical  fields.  Stretched  exponentials,  log-normal  and  even  power-law  distributions 
were  introduced  to  describe  seismic  activity,  rain-rates,  atmospheric  turbulence,  and 
numerous  other  natural  phenomena.  Opening  the  Pandora’s  box  of  non-Gaussian 
statistics  with  “long-”  or  “fat-tailed”  distributions  (Waymire  and  Gupta  1981)  raises 
important  questions  about  sampling.  We  will  refer  to  these  issues  generically  as 
“ergodicity”  problems,  a  terminology  that  better  reflects  our  model-based  investigation. 
The  basic  question  is:  Do  space/time  averages  converge  to  well-defined  ensemble 
counterparts  with  sample  size  and  how  fast?  More  formally  put:  How  much  of 
probability  space  do  we  need  to  explore  to  characterize  a  distribution?  For  exactly 
Gaussian  processes,  the  answer  to  these  questions  is  more-or-less  contained  in  the  “3a” 
rule:  events  more  than  three  standard  deviations  away  from  the  mean  are  improbable  at 
the  level  0.00 1.^  We  will  show  further  on  examples  of  natural  variability  where  the 
standard  deviation  itself  is  not  even  pinned  down  after  many  thousands  of  observations. 

In  our  experience  with  geophysical  data  analysis,  we  have  encountered  at  least  two 
sources  of  ergodicity  problems  that  generally  appear  compounded:  “intermittency,”  and 
“nonstationarity.”  We  are  adopting  vocabulary  from  time-series  analysis  here  for 
simplicity  but  have  either  space  or  time  in  mind.  [Rather  than  “(non)stationarity,”  the 
technically  correct  usage  in  random  field  theory  is  statistical  “(in)homogeneity.” 
However,  in  keep  with  cloud-modeling  usage,  we  will  reserve  this  last  expression  to 
designate  (non-)constant  fields  which,  in  turn,  are  “(non-)trivial”  from  the  times-series 
perspective.] 

•  It  is  natural  to  think  that  estimates  of  statistical  properties  of  geophysical  signals  are 

independent  of  the  instant  when  observations  start  and  stop.^  This  is  effectively  a 

“stationarity”  assumption,  i.e.,  averages  are  statistically  invariant  under  translations 


^  Every  “Gaussian-type”  or  “thin-tailed”  distribution  has  a  similar  rule;  e.g.,  deviations  in  excess  of  =4.9 
a’s  from  the  mean  p,  are  unlikely  at  the  level  10“^  for  Laplace’s  probability  law: 

Prob{X  <  4  <  X+dX}  =  exp[-V2lX-pl/a]dX/of2a). 

^We  assume  here  that  all  the  observations  belong  to  a  well-defined  class  within  which  we  can  perform 
meaningful  statistics.  At  least  in  atmospheric  applications,  this  may  impose  external  limits  on  the  time  of 
day,  the  season  and  the  position  on  the  globe. 
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in  space  or  time.  Observation  time  is  irrelevant  to  systems  in  some  kind  of  dynamical 
equilibrium.  However,  very  long-range  correlations  can  and  do  occur  in  geophysical 
signals  because  of  the  sheer  size  of  the  system  and  the  coherent,  long-lived  structures 
generated  by  the  large-scale  forcing  and  the  highly  nonlinear  character  of  the 
dynamics.  In  other  words,  stationarity  is  generally  not  a  good  assumption,  at  least  at 
close  range  (small  scales).  The  weaker  assumption  of  nonstationarity  with  stationary 
increments  is  generally  good  enough  at  all  scales  (examples  to  follow).  Only  when 
the  record  is  exceptionally  long,  do  we  observe  a  transition  to  stationarity  per  se.  In 
summary,  the  questions  to  ask  is  ‘for  how  long  are  the  data  correlated?'  or  ‘how  long 
must  we  wait  to  isolate  independent  samples  in  a  given  datastream?’ 

•  Assuming  that  the  nonstationarity  has  been  tamed  by  focusing  on  the  appropriate 
quantities  to  be  averaged  (e.g.,  increments),  we  can  still  be  faced  with 
“intermittency”  problems.  We  borrow  this  concept  from  turbulence  theory  to 
describe  the  occurrence  of  sudden  bursts  of  intense  variability,  very  uncharacteristic 
of  Gaussian  processes  — stationary  or  not.  Following  a  now  well-established 
tradition  in  turbulence  studies  (Parisi  and  Frisch  1985,  Meneveau  and  Sreenivasan 
1987a,  etc.),  we  can  use  “multifractal”  statistics  to  describe  intermittency  in  natural 
signals.  These  are  straightforward  generalizations  of  the  2nd  order  statistics 
mentioned  above  where  moments  of  all  orders  — within  limits  set  by  sampling 
considerations —  are  computed  on  a  scale-by-scale  basis  and  where  the  dependence 
on  scale  is  parameterized  by  power  laws.  We  thus  define  “scaling”  regimes  and 
associated  families  of  exponents. 

A  priori,  nonstationarity  and  intermittency  are  purely  qualitative  attributes  when  it  comes 
to  data  analysis.  Furthermore,  some  if  not  all  of  the  positional  information  needed  to 
make  a  statement  about  stationarity  is  lost  due  to  the  spatial  averaging  that  produces  the 
statistics  in  the  first  place.  We  will  show  that,  in  the  framework  of  scale-invariant 
processes,  nonstationarity  and  intermittency  can  both  be  not  only  detected  (cf.  §5.1  and 
§6.3)  but  precisely  quantified  as  well  (cf.  §4.5). 

Fractal  (single-moment)  and  multifractal  statistics  were  originally  perceived  as 
abstract  and  were  criticized  for  having  little  bearing  on  the  underlying  physics.  Serious 
efforts  have  been  put  forth  to  make  multifractal  concepts  attractive  to  a  broad  range  of 
geophysicists  (e.g.,  Davis  et  al.  1994a),  and  their  connections  with  wavelet  analysis  are 
now  well-understood  (Muzy  et  al.  1994).  It  is  true  that  fractal  concepts  become 
mathematically  precise  in  the  small-scale  limit  ...  which  is  generally  unjustified  on 
physical  grounds.  Nevertheless,  “physical”  fractals  (with  well-defined  inner-  and  outer- 
scales)  have  proven  to  be  very  helpful  models  of  reality  in  a  broad  range  of  applications. 
In  fact,  the  limits  of  the  scaling  regimes  themselves  convey  as  interesting  information  on 
the  system  as  do  the  exponents,  if  not  more.  We  will  survey  our  findings  in  this  area  with 
respect  to  cloud  structure  and  cloud-radiation  interaction. 

1.2  Overview:  The  “Laboratory”  Model  for  Geophysical  Data  Analysis 

The  paper  is  organized  as  follows  — in  the  spirit  of  a  report  on  a  laboratory 
experiment  (cf.  Fig.  1).  The  goal  of  the  “experiment”  is  to  characterize  the  structure  of 
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Figure  1:  Flow-Chart  for  the  "Laboratory”  Model  for  Scale-by-Scale  Statistical  Analysis  of  Geophysical 
Data.  There  are  two  feed-back  loops.  One  signifies  that  new  theory  makes  new  predictions  to  be  verified 
with  new  data  (from  new  instruments  if  necessary).  The  other  represents  the  production  of  synthetic  data 
with  stochastic  models  to  calibrate  the  statistical  “instrumentation”  with  standard  input.  Stocahstic 
modeling  also  feeds  into  the  body  of  theory  that  explains  the  data.  In  the  case  of  cloud-radiation  theory,  the 
numerical  modeler  can  control  the  weather  in  his  digital  cloud  system,  the  simplest  models  may  be 
amenable  to  analytic  methods. 
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marine  stratocumulus  for  the  purposes  of  radiative  transfer  computations.  These  are 
climatologically  important  cloud  systems,  and  our  main  motivations  to  study  them  are  (i) 
improved  models  for  understanding  the  atmospheric  radiation  budget,  hence  present  and 
future  climate,  and  (ii)  improved  methods  for  retrieving  cloud  properties  from  remotely- 
sensed  signals. 

•  Section  2  (“Sample  Collection”):  we  present  ID  and  2D  data  pertaining  to  marine 
stratocumulus  and  further  motivate  their  study. 

•  Section  3  (“Instrumentation”)  describes  the  basic  tools  of  scale-by-scale  statistical 
analysis:  coarse-graining,  autocorrelation,  structure  functions  and  spectral  analysis. 

•  Section  4  (“Results”)  establishes  the  relevance  of  power-law  parameterizations  for 
scale-dependent  cloud  statistics  and  defines  notations  for  the  associated  exponents. 

•  Section  5  (“Semi-Empirical  Criteria”)  shows  how  we  interpret  certain  scaling 
properties,  with  an  emphasis  on  ergodicity  issues:  stationarity,  intermittency  and  the 
onset  of  sampling  problems  are  discussed. 

•  Section  6  (“Theoretical  Considerations”):  we  rephrase  our  outlook  on  data  analysis  in 
terms  of  symmetry  and  broken  symmetry;  this  impacts  directly  our  understanding  of 
cloud  structure  and  how  it  transpires  in  satellite  images.  We  also  draw  parallels 
between  our  methods  and  those  of  statistical  physics. 

•  Section  7  is  a  summary. 

•  Appendix  (“Calibration  and  Simulation”):  a  number  of  scale-invariant  models  are 
introduced  and  classified  according  to  the  criteria  in  Section  5.  These  algorithms  for 
generating  synthetic  data  have  many  applications,  our  present  concerns  being  (1) 
assessment  of  the  reliability  of  analysis  procedures  and  (2)  simulation  of  realistic 
clouds-in-a-computer  that  enable  numerical  radiation  transport  studies.  In  all  cases, 
the  statistical  properties  — namely,  the  exponents —  are  known  a  priori. 


2.  Sample  Collection  (Cloud  Data  as  an  Object  of  Study) 

When  designing  an  experiment,  generally  to  test  some  hypothesis,  the  first  questions 
to  ask  are  ‘What  are  we  going  to  study?*  and  ‘Why?’.  We  can  view  data  analysis  as  a 
straightforward  experimental  procedure  where  the  object  of  study  is  the  data  itself  (Fig. 
1).  We  will  assume  it  to  be  stored  in  a  ID  or  2D  array  of  real  numbers  residing  in 
computer  memory.  In  this  section,  we  present  geophysical  data  and  present  a  rationale 
for  an  in-depth  study  of  its  statistical  properties.  The  next  questions  are  closely  related: 
to  each  other:  ‘What properties  are  we  interested  in?’  and  ‘What  instruments  will  we  use 
to  probe  our  sample?’.  They  are  addressed  in  sections  3-4. 

We  present  here  data  used  in  our  specific  research  area:  internal  structure  of  marine 
stratocumulus  (Sc)  and  its  impact  on  radiation  transport.  Generally  speaking,  cloud- 
radiation  interaction  is  a  source  of  considerable  uncertainty  in  the  prediction  of  climate 
and  climate  change.  Being  both  persistent  and  extended,  marine  Sc  layers  are  responsible 
for  a  large  portion  of  the  Earth’s  global  albedo,  hence  the  planet’s  overall  energetic 
equilibrium.  A  robust  statistical  characterization  of  marine  Sc  structure  is  therefore  in 
order.  In  particular,  this  will  allow  us  to  develop  statistically  realistic  cloud  models 
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which  can  in  turn  be  used  to  investigate  radiation  issues.  An  improved  understanding  of 
how  physical  cloud  properties  relate  to  their  radiation  fields  has  important  spin-offs  in  the 
area  of  remote  sensing.  This  is  the  only  cost-effective  way  of  monitoring  cloud  cover 
from  synoptic  to  pixel  scales  (several  kilometers  or  meters,  depending  on  the  device). 

2.1  One-Dimensional  in  Situ  Transects  of  Liquid  Water  Content  from  FIRE 

2.1.1  Internal  Cloud  Structure  Using  Taylor's  Frozen  Turbulence  Hypothesis 

Clearly,  there  is  no  better  way  to  study  cloud  structure  than  by  direct  probing.  This 
calls  for  a  fully  instrumented  aircraft  and,  because  of  the  costs  involved,  few  datasets  of 
this  type  are  available.  Furthermore,  cloud  liquid  water  content  (LWC)  measurement  is 
still  an  area  of  active  research  (Gerber  et  al.  1994).  We  present  here  transects  of  LWC  in 
marine  Sc  that  we  will  use  to  illustrate  ID  data  analysis  in  the  remainder  of  the  paper. 

In  Figs.  2a-e,  we  show  representative  samples  of  LWC  vs.  time  from  five  flights  (or 
flight  legs)  during  FIRE^  in  June-July  1987  off  the  coast  of  southern  California. 
Following  a  well-established  practice  in  the  turbulence  literature  (Taylor’s  frozen 
turbulence  hypothesis),  we  perceive  these  time-series  as  ID  cuts  through  the  spatially 
variable  LWC  field: 

fi(xntXxm  =  me,m~  l,...,M/(i=  1,...,5).  (1) 

Table  1  shows  the  important  parameters  of  the  datasets,  described  in  more  detail  by  Davis 
et  al.  (1996a).  In  particular,  they  tentatively  relate  the  down-spikes  that  characterize 
Figs.  2a, b  to  dynamical  instabilities,  and  they  question  the  reality  of  the  strong  “dip”  in 
Fig.  2c.  In  the  following  analyses,  events  affected  by  this  feature  are  ignored  but 
Marshak  et  al.  (1996)  examine  the  consequences  of  not  eliminating  the  spurious  dip. 

Table  1:  FIRE  Liquid  Water  Content  (LWC)  Database.  The  statistically  relevant  parameters  of  the  various 
datasets  are  collated.  They  were  obtained  from  an  airborne  platform  during  the  FIRE  1987  stratocumulus 
experiment,  off  San  Diego  (Ca.).  A  nominal  aircraft  speed  of  100  m/s  was  used  to  convert  time  to  space, 
the  sampling  rate  being  20  Hz. 
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2.1.2  Visualizing  Intermittency  with  Small-Scale  Absolute  Gradients 

The  most  interesting  (i.e.,  strong  and  somehow  “organized”)  features  in  Figs.  2a-e  are 
the  large  and  well-localized  downwards  deviations  that  occur  intermittently  but 


^ First  ISCCP  Regional  Experiment  (ISCCP  =  International  Satellite  Cloud  Climatology  Project). 
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Figure  2:  Liquid  Water  Content  (LWC)  Transects  from  In  Situ  Airborne  Probings  of  Marine 
Stratocumulus.  {a-e)  Representative  portions  of  the  five  datasets  described  in  Table  1 ;  these  are  examples 
of  nonstationary  processes  with  stationary  increments  (Fig.  4  and  §5.1)  and  multiscaling  structure  functions 
(Fig.  6  and  §4.2).  These  data  were  collected  in  marine  stratocumulus  during  FIRE  in  1987  off  the  coast  of 
southern  California,  (a’-e’)  Absolute  next-neighbor  differences  for  the  data  in  panels  (fl-e);  these  are 
examples  of  intermittent  stationary  processes  (Fig.  4  and  §5.1)  with  bone  fide  multifractality  (Table  2  and 
§5.3),  as  revealed  by  singularity  analysis  (Fig.  8  and  §4.4). 


Figure  3:  Radiance  Field  of  Marine  Stratocumulus.  Gray-scale  rendering  of  a  4096x4096  portion  of  a 
Landsat  image  of  a  typical  cloud  deck  off  the  coast  of  southern  California.  This  scene  was  captured  at 
visible  wavelengths  (channel  2  of  the  Thematic  Mapper)  on  June  30  1987;  so  the  climatological  conditions 
are  similar  to  those  prevailing  when  the  in  situ  LWC  data  in  Fig.  2  were  obtained. 
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nevertheless  seem  to  cluster.  The  large  jumps  that  characterize  these  events  are 
highlighted  by  taking  the  absolute  gradients  of  the  data  at  some  small  (but  presently 
unspecified)  scale  T|.  Adopting  units  where  the  sampling  scale  (grid  constant)  f  =  1  for 
simplicity,  we  have: 

^ijin‘,Xm)  =  UXfn+r])-fi(xni)\,  m  =  (j  =  i  =  1,...,5).  (2) 

The  new  data  obtained  from  that  in  Figs.  2a-e  with  T|  =  ^  =  1  are  presented  in  Figs.  2a'-e\ 
In  turbulence  studies  where  f(xm)  is  most  often  velocity,  r|  is  taken  to  be  the 
“Kolmogorov”  scale  where  dissipation  forces  start  to  dominate  inertia.  Furthermore,  it  is 
traditional  in  this  context  to  take  squares  rather  than  absolute  values  in  Eq.  (2)  since  this 
yields  a  ID  cut  through  the  physically  important  field  that  describes  the  local  rate  of 
kinetic  energy  dissipation  (Meneveau  and  Sreenivasan  1987a). 

2.2  A  Two-Dimensional  Radiance  Field  from  lANDSAT  Captured  During  FIRE 

An  attractive  alternative  to  in  situ  probing  of  cloud  structure  is  to  use  high-resolution 
satellite  imagery.  It  is  relatively  inexpensive  compared  to  outfitting  and  flying  research 
aircraft,  more  comprehensive  than  aircraft  probing  or  ground-based  radiometry  (being 
2D),  and  more  frequent  in  time. 

Figure  3  is  a  gray-scale  rendering  of  a  large  (4096x4096)  section  of  a  cloudy 
LANDSAT  scene: 

mi  =  1,...,2«,  m2=  1,...,2”  (n  =  12).  (3) 

The  signal  /  is  digitized  over  256  levels  and  is  almost  proportional  to  nadir-viewing 
radiance  at  satellite  level  (=800  km).  LANDSAT ’s  radiometer  was  not  originally 
designed  for  such  bright  targets  as  clouds,  so  saturation  (at/=  255)  occurs  frequently, 
17%  in  our  case.  In  order  to  avoid  spurious  saturation  effects,  the  up-coming  statistical 
analyses  use  only  the  2048x4096  leftmost  pixels  which  are  only  7%  saturated. 


3.  Instrumentation  (Scale-by-Scale  Analysis  Tools  for  ID  or  2D  Datasets) 

Pursuing  our  analogy  between  data  analysis  and  laboratory  work,  we  describe  the  first 
part  of  the  experimental  procedure.  What  “measurements”  are  we  going  to  do?  What 
“instruments”  are  we  going  to  use?  The  object  under  scrutiny  is  cloud  data  stored  in  a 
large  portion  of  computer  (and  possibly  peripheral)  memory.  The  instruments  are 
programs  that  process  this  data;  their  output  constitute  statistical  measurements.  This 
new  “data,”  residing  in  far  less  memory,  describes  partially  the  dataset.  In  essence,  we 
are  observing  the  statistical  “behavior”  of  the  data/subject  with  different  devices  and,  in  a 
sense  we  will  define  in  §3.3,  under  different  “experimental  conditions.” 

All  our  instruments  have  two  computational  stages,  performed  in  sequence  (§3.1)  or 
in  parallel  (§§3.2-4).  First  comes  an  analysis  procedure  that  yields,  in  general,  a  quite 
large  number  of  random  variables  by  resampling  and  operating  on  the  data.  This  is 
followed  by  a  spatial/ensemble  averaging.  Consider,  as  an  example,  the  computation  of 
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1 -point  variance  it  calls  for  (1)  forming  the  1st  and  2nd  powers  off(x)  then  (2) 

obtaining  their  averages  over  all  the  available  data.  In  the  following,  we  will  consider 
exclusively  2-  and  more-point  statistics  that  contain  information  about  correlations  (or 
“structure”)  in  the  datastream.  Since  we  always  compute  statistical  properties  at  a 
specific  scale  r,  we  refer  to  these  techniques  collectively  as  “scale-by-scale”  analyses. 


3.1  Spectral  Analysis 


Correlations  in  random  data  can  be  studied  via  Fourier  analysis,  leading  to  energy 
spectrum  estimation.  This  is  a  well-traveled  approach  to  scale-by-scale  analysis  where 
the  scale  parameter  is  wavenumber  k,  related  to  the  length  scale  r  =  \/k  in  physical  space. 
Let  J(k)  be  the  ^/-dimensional  Fourier  transform  of  a  field /(jt)  defined  on  [0,L)^: 


J(k}  =  ^x)  cxpilnixk)  d^x 


(4a) 


where  the  normalized  wavevector  kL  scans  Z?.  In  the  discrete  case,  the  above  integrals 
become  sums  (with  d^x  =  f^): 


J{k)  =  ^^5]/(jc)xexp(27itirA:).  {Ab) 

JC 

For  d=\^k  now  goes  from  -A:n,  excluded,  to  -i-^n,  included,  where 

^  (5) 

is  the  maximal  (or  “Nyquist”)  wavenumber,  by  steps  of  Ak  =  I/L2  with^ 

L2  =  M2^  =  2[log2^^,  (6) 

[']  designating  integer  part.  For  d  =  2,  the  subset  (-/:Ni>2,+^Nf^]®(-^N^2»+^N^2]  of  is 
scanned  by  kL2  =  (kxLi^kyLo).  We  note  that,  forf(x)  e  Ji~k)  -Ji+k)*. 

In  Fourier  space,  the  spatial  averaging  step  is  replaced  by  a  summation  over  phases, 
equivalently,  over  wavenumber  sign.  We  compute  the  energy^  spectrum  in  =  1  from 

E(k)  =  {\\J(+k)\\'^+\\Ji-k)\\'^)Ak  =  <ll7(it)l|2>  (7a) 

for  kL  E  EHJ,  and  where  6oa:  =  1  for  ^  =  0,  0  otherwise.  In  the  discrete  (£  >  0)  case,  the 
Fourier  series  is  truncated,  hence 


^The  most  popular  Fast  Fourier  Transform  packages  (e.g..  Press  et  al.  1993)  require  M  to  be  a  power  of  2. 
If  this  is  not  the  case,  the  first  and  last  M2  data  points  can  be  treated  as  two  realizations  in  an  ensemble 
average.  Generally  speaking,  ensemble  averaging  over  a  number  of  datasets  poses  no  special  problem  as 
long  as  they  have  the  same  £  and  L%  otherwise,  common  units  for  k  and  E{k)  must  be  defined  if  different 
L2’s  occur  and  A:*bins  must  be  used  if  different  ^’s  occur. 

is  called  interchangeably  “power”  or  “energy”  or  “wavenumber”  or  “frequency  spectrum,”  and 
sometimes  “periodogram.” 
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E(k)=^{\\J(k)\fi)  Ob) 

for  kL2  =  0,1,...,L2/(2^)-  In  d  =  2  cases,  we  assume  statistical  isotropy  and  sum  (Il7(/:)ll^> 
over  circles  of  radius  k  =  \k\  =^1  kx^+ky^  in  Fourier  space.  For  continuous  spectra  (i  =  0, 
L  — >  oo,  hence  M  — >  0),  we  have 


271 

E{k)  =  d^A:  =  k  |<ll7(fccos6,A:sin0)l|2)  dO  (8fl) 

\k\=k  0 

for  A  >  0.  In  the  discrete  (L  <  oo,  ^  >  0)  case,  kLi  goes  from  0  to  k^sJ-Q.  -  [^k^^Lo]  and 

E(k)=^  X  (IIJ(*)lP>  (8*) 

^kLi^Li<kLa+\ 

for  A£-2  ==  0, 1 , . . .  >AmaxI^‘ 

It  is  sometimes  advantageous  to  cumulate  and  average  E{k)  and  k  in  octave-wide  bins 
(i.e.,  by  factors  of  2  in  k)  for  A  >  0  (Davis  et  al.  1996fl).  There  are  precisely  [log2M]-l 
bins  when  d=\:  [kL2  e  M;  2'  <  klj2  <  2'+l-l },  for  t  =  0,  [log2A^-2;  this  excludes 

the  Nyquist  frequency,  the  most  aliased  anyway  (Press  et  al  1993).  In  =  2  cases,  one 
more  bin  can  be  populated  by  wavevectors  with  their  modulus  between  An  and  Amax^  in 
all,  {AL2  g  (-AnL2,AnL2]2;  2'  <  IAIL2  <  2*+l-l }  for  i  =  0, ...,  [log2Af]-l. 

3.2  Two-Point  Correlation  Analyses  at  Order  2 

There  are  two  physical  space  counterparts  of  spectral  analysis  for  correlation  studies. 
First,  one  can  form  the  products  df[x+r)bf{x)  of  the  fluctuating  part  bf^x)  =f(x)-{f)  of  a  ID 
signal  at  two  points  and  spatially  (then,  if  necessary,  ensemble-)  average  them  to  obtain 
the  autocorrelation  function: 


{bf(x+r)bfix))  =  (fix+ry{x))-(J^.  (9a) 

This  well-known  2nd  order  2-point  statistic  does  not  give  us  any  new  information  since  it 
is  related  to  the  energy  spectrum  in  Eqs.  (7-8)  by  the  Wiener-Khinchin  (W-K)  relation: 


{Kx+r)f(x))-{fp- =  Jcos(27trA)£(A)dA.  (9b) 

0 

At  r  =  0,  the  l.h.s.  reduces  to  the  1-point  variance,  and  the  r.h.s.  is  the  integral  of  E(A);  so 
E(k)Ak  is  simply  the  part  of  the  variance  that  comes  from  scales  =1/A. 

One  can  also  form  “increments,” 

Af(r;x)  =fix+r)-f(x),  (10) 

in  ID  and  compute  the  2nd-order  “structure  function,” 

<Af(ry:)2>  =  {[f(x+/-)-/(j:)]2),  (lla) 
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also  known  as  a  “semi-variogram”  (Christakos  1992).  Here  again  there  is  a  W-K  relation 
with  the  energy  spectrum: 


<A/(r;;c)2>  =  2  j‘[l-cos(27Crk)]E(/:)dk,  (1  \b) 

as  results  from  identities  <A/(rp:)2)  =  2[{fi)~{f{x+ry{x))]  =  2[((/2>-(/)2)-(5/(;c+r)5y(x)>]. 
Theoretically  (i.e.,  when  doing  ensemble  rather  than  spatial  averages),  these  operations 
are  only  meaningful  in  “broad-sense”  stationary  situations  where  {bJ{x+r)6J{x))  depends 
only  on  r.  However,  Eq.  (11^)  generalizes  to  nonstationary  signals  with  (broad-sense) 
stationary  increments,  namely,  where  {AJ{r^)‘^)  is  function  of  r  alone. 

In  isotropic  2D  situations,  bj{x+ru)6f{x)  or  \J{x+ruy-f(x)]'^  can  be  obtained  by 
averaging  over  the  allowable  domain  of  x  and  the  orientation  of  the  unit  vector  u.  This 
straightforward  approach  quickly  becomes  computationally  intractable  since  it  requires 
~N^  operations  (where  N  =  Afi  is  the  total  number  of  points).  In  contrast,  FFT 
implementations  of  Eqs.  (Sb)  or  (life)  require  only  ~N\nN  operations;  in  this  case 
however,  we  interpret  E{k)  as  the  so-called  “ID”  spectrum  obtained  by  dividing  the  r.h.s. 
of  Eq.  (8fe)  by  averaging  rather  than  just  summing  !?‘(A:)I|2.  Another 

approach  (with  only  ~N  operations  and  generalizable  to  higher  orders)  is  adopted  in  the 
remainder  of  this  study:  to  treat  rows  and  columns  as  an  ensemble  of  ID  datasets.^ 

3.3  qth-Order  Structure  Functions 

How  can  one  gain  new  information  in  the  framework  of  2-point  statistics?  Simply  by 
looking  at  moments  of  order  q  ^2.  The  random  variables  of  interest  are  then  IA/(r;x)l^ 
and  averaging  yields  the  ^th-order  structure  function: 2 

<IA/(r;x)l^>  =  (^f(x+r)-jlx)\^).  (12) 

Unfortunately,  we  lose  the  W-K  connection  and  the  computational  efficiency  of  FFTs  in 
2  or  more  dimensions.  However,  the  focus  on  increments  is  akin  to  a  high-pass  filtering. 
Therefore,  at  the  cost  of  using  the  ~N  coefficients  of  a  discrete  wavelet  decomposition  of 
f(x)  as  surrogates  for  A/(r;jc),  the  utilization  of  multiresolution  analysis  (Mallat  1989)  will 
lead  to  efficient  computational  algorithms. 

What  insight  do  we  gain  by  varying  the  parameter  ql  Of  all  possible  values  that  the 
increment  5  =  \Af(r;x)\  can  take,  we  can  identify 

•  “typical”  values  that  occur  most  frequently,  being  near  the  mode  of  the  pdf  pr(5), 

•  “mean”  values  that  dominate  the  average  for  ^  =  1  (i.e.,  maximize  5/?r(5)), 

•  “r.m.s.”  values  that  dominate  the  average  for  ^  =  2  (i.e.,  maximize  S^pr(^)). 

There  are  also  ever  larger  and  rarer  values  that  dominate  higher  order  statistical  moments: 
q  =  3,A,  etc.  So,  increasing  q  amounts  to  looking  at  the  more  extreme  values  of  lA^rpc)!. 


^In  theory,  sampling  is  poor  (~N  out  of  possible  events);  but  N  =  M£<My  is  already  large  in  general. 
^Validation  of  structure  function  computation  and  sensitivity  studies  with  respect  to  amount  of  data  can  be 
performed  with  the  help  of  the  models  presented  in  the  Appendix,  more  specifically  in  §A.2  and  §A.4. 
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This  is  akin  to  changing  the  experimental  conditions  (e.g„  temperature  T)  and 
observing  changes  in  the  state  of  a  macroscopic  physical  system  made  of  many 
macroscopic  elements  interacting  with  each  other;  for  instance,  a  real  gas  or  a  ferro- 
magnet.  By  lowering  T  the  system  can  be  forced  into  otherwise  very  unlikely 
configurations.  The  observable  (macroscopic)  changes  can  be  very  subtle  or  extremely 
dramatic,  like  when  a  phase  transition  occurs:  e.g.,  a  mole  of  H2O  molecules  goes  from 
vapor  to  liquid,  to  solid.  Whatever  the  outcome,  we  learn  more  about  the  specificity  of 
the  system  (data)  by  exploring  as  large  a  range  of  Ts  (^’s)  as  possible. 

3.4  Running  Averages  and  the  qth-Order  Moments  of  the  Coarse-Grained  Field 

The  high-pass  filtering  implicit  when  taking  increments  over  various  scales  eliminates 
the  large-scale  mean  value  (/)  from  the  picture.  What  about  the  converse  operation, 
computing  local  means  (low-pass  filter  output)  at  various  scales? 

Consider  for  instance  the  running  mean^  off(x)  over  [x,x+r): 

j  x+r 

M/'';^)=7  ]/(j^’)dJc’,  (13fl) 

JC 

for  X  =  [0,L-r)  and  r  >  0.  In  the  2D  case,  spatial  averaging  is  over  the  square  domain 
[x,x-f-r)®[y,y+r)  of  area  Consider  also  running  variance  of  fix)  over  [x^+r): 

j  x+r 

a/(r^)  =  -  j/(x')2dj:’  -  (Ub) 

X 

In  one  application  of  the  above  running  means  and  variances,  we  hold  x  constant  and  vary 
the  scale  r  continuously  until  some  kind  of  “convergence”  is  obtained  (cf.  discussion  of 
Fig.  8a).  Instead,  we  can  think  of  r  as  a  fixed  parameter;  then  \lj{r\x)  and  Ojir,x), 
obtained  in  Eqs.  (I3a,b),  are  random  numbers.  From  this  perspective,  we  can  study  their 
statistical  properties  by  averaging  over  x,  the  position  of  the  segment  or  square  (cf. 
discussion  of  Fig.  8£>),  as  well  as  seek  the  correlations^  between  and  Oj(r^). 

By  resampling  x  at  intervals  of  length  r  in  Eq.  (13a),  we  obtain  a  “coarse-grained” 
version  of  the  original  field  ^x),  with  L/r  pixels  rather  than  L/L  This  is  of  particular 
interest  for  the  absolute  gradients  defined  in  Eq.  (2).  In  the  ID  continuum  limit  used  in 
Eqs.  (13a),  we  define 

j  x+r 

e(r;x)  =  ^le^p:)  =  “  Je(x’)dx’,  r  >  0,  (14a) 

X 


Un  the  language  of  continuous  wavelet  transforms,  Eq.  (13a)  is  the  projection  of  the  signal /(a:)  onto  a 
functional  space  of  “scaling  functions.”  In  this  case,  the  scaling  function  equals  1/r^  on  a  compact  support 
of  measure  and  0  elsewhere;  Gaussian  and  otherwise  variable  “windows”  have  also  been  considered. 

^This  is  the  basis  of  the  “spatial  coherence”  method  developed  by  Coakley  and  Bretherton  (1982)  for 
recovering  fractional  cloudiness  from  satellite  radiances. 
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and  similarly  in  2D.  This  translates  to 

1  x+r-1 

E(r;x)=-  r>  1,  (14fc) 

x’=x 

in  the  discrete  case  where  the  small-scale  £(lpc)  field  is  defined  on  a  grid  with  constant 
£=1;  in  other  words,  units  of  length  where  =  1  are  employed  in  Eq.  (2)  when  used  to 
obtain  £(l;:c)  from^fjc).  Generalization  from  ID  to  2D  is  again  straightforward  and,  as  for 
the  wavelet-based  surrogates  for  the  A/(ryc)’s  mentioned  in  §3.2,  efficient  computation  of 
£(r;x)  at  appropriately  selected  x's  can  be  implemented  in  the  framework  of 
multiresolution  analysis  (cf.  graphics  by  Davis  et  al.  (1994a)). 

Having  obtained  the  non-negative  random  numbers  in  Eqs.  (14a,A»),  we  can  take  their 
^th  powers  and  average  the  results  over  x,  the  positions  of  segments  or  squares,  to  obtain  * 
<£(r;j»:)^>  with  the  same  advantages  in  terms  of  experimental  “temperature”  control  as 
discussed  earlier  for  <IAy(rp:)l^). 


4.  Results  (Power-Law  Regimes  and  Scale-Breaks) 

In  this  section,  we  examine  the  output  of  our  laboratory  “instruments.”  These 
“measurements”  are  a  new  — and  highly  compressed —  form  of  data:  statistical 
quantities,  always  parameterized  by  scale.  Typically,  the  scale  parameter  r  oi  k  spans  a 
large  range  of  values  and,  in  general,  the  statistics  (e.g.,  E{k),  {\f(x+r)-f(x)\^),  or  <£(r;x)^)) 
do  too.  The  usual  way  of  visualizing  quantities  with  large  ranges  is  to  use  log-log  plots 
and  we  naturally  ask:  Are  there  significant  ranges  of  scale  where  Xogfstatistic)  is  linear 
in  log(r).!^  This  is  often  the  case  and  they  are  called  “scaling”  regimes.  In  such  regimes 
the  important  parameter  is  the  slope  on  the  log-log  plot,  equivalently,  the  exponent  in  the 
associated  power  law.  In  this  section,  we  introduce  notations^  for  a  number  of  exponents, 
present  results  for  our  cloud-related  test  data  and  discuss  their  most  striking  features. 

4.1  Scaling  in  Spectral  Analysis 

Figure  4a  shows  octave-binned  energy  spectra  for  the  five  ID  datasets  in  Table  1, 
partially  illustrated  in  Figs.  2a-e.  Log-log  axes  are  used  and  the  /r-ranges  are  different 
due  to  the  different  lengths  of  the  datasets  (L2’s).  We  see  good  scaling  in  all  cases,  at 
least  for  Iq^/lQi^  <k<  (last  7  octaves),  with  good  agreement  in  the  prefactors  for  4  of 
the  datasets.  The  dispersion  at  small  k  (large  scales)  reflects  the  visual  diversity  of  Figs. 
2a-e.  The  odd  dataset  is  also  the  longest  (L  =  330  km,  about  a  half  of  all  the  FIRE  LWC 
data),  only  an  eighth  of  which  is  illustrated  in  Fig.  2c.  Apart  from  the  suspicious  dip 
visible  in  that  figure  (but  not  incorporated  in  our  analyses),  this  data  looks  very  smooth. 
Spectrally,  this  translates  in  too  ways:  (1)  the  prefactor  in  the  scaling  regime  is 


^Validation  of  these  computations  and  sensitivity  studies  with  respect  to  amount  of  data  can  be  performed 
with  the  help  of  models  presented  in  the  Appendix,  specifically  in  §A.3. 

^Unfortunately,  there  are  several  co-existing  standards  in  the  literature. 


Ill 


significantly  smaller  than  for  the  four  other  datasets;  (2)  there  is  a  specific  scale  Hk  at 
which  the  Fourier  modes  stop  increasing  with  scale  and  become  constant.  This  is  known 
as  an  “integral”  (correlation)  scale,  denoted  R\  we  estimate  R  ~  20-40  km. 

Figure  Ab  shows  our  results  for  octave-binned  E{k)  for  the  five  absolute  next-neighbor 
gradient  fields  associated  with  the  LWC  data  analyzed  in  Fig.  4a.  We  notice  that  the 
behavior  is  somewhat  more  erratic  than  in  Fig.  4a  but  still  shows  a  reasonable  correlation 
between  logE(fc)  and  log/:  for  at  least  two  decades  in  scale  («7  octaves). 

Figure  4c  shows  the  octave-binned  estimates  of  the  ensemble-average  spectra  for  the 
LWC  fields  i  =  1,...,5)  and  for  their  absolute  small-scale  gradients  £/(^;x),  from  Eq. 
(2)  with  ri  =  f  (in  this  special  case,  the  second  subscript  is  unnecessary).  For  both 
spectra,  we  see  power-law  behavior  over  no  longer  two  but  three  decades  (10  octaves)  in 
scale.  In  other  words,  the  realization-to-realization  variability  in  E{k)  at  large  scales  has 
been  damped  by  the  ensemble-averaging.  Unfortunately,  >  the  break  at  20-40  km  is 
not  apparent  in  the  average  data  because  it  starts  at  A:  =  l/min{L2}  where  min{L2}  =  40 
km  according  to  Table  1.  However,  we  are  confident  that  the  transition  from  increasing 
to  constant  variance  as  scale  increases  {k  decreases)  will  be  consistently  observed, 
whether  at  20-40  km  or  more,  as  longer  LWC  datasets  become  available.  This  is  because 
LWC  is  bounded  on  physical  grounds:  it  is  non-negative  by  definition  with  an  upper 
bound  which  is  a  some  small  portion  of  the  total  amount  of  water  vapor  that  the  laws  of 
thermodynamics  allow  a  column  of  atmosphere  to  contain.  This  means  that  Fourier 
modes  cannot  be  arbitrarily  strong;  a  cut-off  must  occur  when  they  reach  the 
climatological  mean  LWC,  otherwise  fluctuations  overwhelm  the  dc  {k  =  0)  component 
and  the  bounds  will  be  exceeded. 

The  spectral  scaling  exponent  p  in 


E(k)~k-^  (15) 

has  quite  different  values  for  the  two  types  of  data:  -1.4  for  the  LWC,  and  ==0.7  for  the 
associated  e-fields;  we  discuss  the  statistical  significance  of  this  difference  in  §5.1. 

Figure  5  shows  E{k)  vs.  k  in  log-log  axes  for  the  l.h.  half  of  the  Landsat  image  in 
Fig.  3.  In  this  case,  there  is  so  much  averaging  involved  that  octave-binning  is  not 
required  to  reduce  the  statistical  noise.  We  see  a  clear-cut  break  at  the  integral  scale: 
roughly  constant  Fourier  amplitudes  (p  «  0)  for  scales  larger  than  «  20  km.  There  is 
also  a  break  at  «0.2  km,  a  scale  we  denote  Tjrs-  Between  MR  and  1/T|rs,  E{k)  follows  a 
power  law  with  p  ~  2.0  and  beyond  1/T|rs  a  steeper  law,  with  an  exponent  in  excess  of  3, 
the  differentiability  limit.  In  other  words,  small-scale  variability  is  smoother  than 
expected.  2 

Davis  et  al  (1996/?)  investigate  the  scale  break  at  T|rs  theoretically,  relating  it  to  a 
radiative  smoothing  phenomenon  mediated  by  horizontal  photon  transport  by  multiple 
scattering.  One  outcome  of  this  study  is  a  simple  expression  for  the  “radiative 
smoothing”  scale  Tirs  in  terms  of  mean  optical  and  geometrical  cloud  properties. 


1  Because  very  long  records  are  needed,  it  is  not  always  easy  to  estimate  R  in  large  geophysical  systems. 
^At  the  very  smallest  scales  (=60  m,  twice  the  pixel  scale),  another  flattening  occurs  due  to  digitization. 
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Figure  4:  Energy  Spectra  of  ID  LWC  data,  (a)  Five 
individual  wavenumber  spectra  for  the  LWC  datasets 
listed  in  Table  1,  using  octave-bins  and  log-log  axes, 
(i)  Same  as  (a)  but  for  the  absolute  small-scale 
gradient  fields,  (c)  Ensemble-average  spectra  from 
the  data  in  panels  (n)  and  (b\  weighted  by  the  overall 
lengths  in  Table  1. 


Figure  5:  Energy  Spectrum  of  2D  Radiance  data. 
Log-log  plot  of  the  wavenumber  spectrum  of  the 
Landsat  data  in  Fig.  3. 

wavenumber  k  (km  ’) 


log^(r/l),  1=5  m 

Figure  6:  Structure  Functions  of  ID  LWC  data,  (a)  Scaling  for  q  =  1(1)5.  (b)  ^{q)  exponents. 


log^{r/l},  1=30  m 


Figure  7:  Structure  Functions  of  2D  Radiance  data,  (a)  Scaling  for  ^  =  1(1)5.  (b)  l^{q)  exponents 
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4.2  Multifractal  Analysis  1,  Higher-Order  Structure  Functions 

Figure  6a  shows  the  ensemble-average  \og{\f(x+r)-f(x)\^)  vs.  logr  for  integer-valued 
q=l,...,5  using  the  ID  LWC  data  as  input.  We  note  that  by  taking  r  =  i,r  =  2C{=  1/%), 
r  =  4^,  etc.,  up  to  r  =  min,=i,...,5{L2j},  we  obtain  one  more  data  point  than  with  E{k=\/r), 
namely,  at  the  pixel  scale  £.  This  extra  datum  enables  us  to  see  some  evidence  of  a  break 
in  the  scaling  at  ^2£-4£  (10-20  m).  At  any  rate,  we  see  the  break  at  the  integral  scale 
(i.e.,  R~  20  km)  better  than  in  the  spectral  data.  Between  these  two  limits,  we  have 
power  laws  for  every  q.  Adopting  the  notation  used  in  the  turbulence  literature,  we  have 

(16) 

Figure  6b  shows  ^(q)  vs.  ^  for  0  <  ^  <  5.  Muzy  et  al.  (1993)  have  developed  an  elegant 
method  for  estimating  ^(^)  directly  from  the  modulus  of  continuous  wavelet  transforms. 

Figure  la  shows  the  scaling  of  the  ^th-order  structure  functions  for  the  l.h.  half  of  the 
2D  data  in  Fig.  3,  showing  good  scaling  between  T|rs  and  R  and  where  both  the  scale- 
breaks,  first  characterized  spectrally  in  Fig.  5,  are  apparent.  (Here  however,  we  would 
significantly  underestimate  the  integral  scale  /?  =  20  km  by  a  factor  of  ~2  due  to  the 
saturation  aty(A:)  =  255.)  Figure  lb  shows  the  corresponding  tf^q)  function. 

We  note  the  concavity l  of  and,  since  ^(0)  =  0  to  ensure  proper  normalization,  we 

can  define  a  non-increasing  hierarchy  of  exponents: 

mq)=^-  (17) 

Data  like  ours  yielding  non-constant  H(q},  hence  a  non-linear  ^(^),  is  “multifractal”  in  the 
sense  of  Parisi  and  Frisch  (1985);  otherwise  (i.e.,  when  t^(q)  q),  it  is  “monofractal.” 

Two  values  of  q  are  of  particular  interest.  At  ^  =  2,  we  retrieve  the  scaling  for  the  2nd 
order  structure  function;  for  power  law  statistics,  the  W-K  relation  in  Eq.  {\\b)  yields 
(e.g.  Monin  and  Yaglom  1975,  p.  92) 

P  =  27/(2)+l  =  C(2)+l.  (18) 

Incidentally,  relation  (18)  is  well  verified  numerically  by  our  data,  especially  if  octave- 

bins  are  used  in  the  spectral  analysis  (Davis  et  al  \996a)?' 

At  ^  =  1,  we  can  retrieve  the  fractal  dimension  Dg  of  the  graph  of  the  data,  viewed  as 
a  set  g  =  {j«:/(:r)}  embedded  in  d-\-\  dimensions  (e.g.  Falconer  1990); 

Dg  =  =  (^+1)-C(1).  (19) 


^This  property  follows  from  characteristic  function  theory  (cf.  Feller  1971).  The  Fourier  transform  of  the 
pdf  of  a  real  random  variable  ^  is  its  characteristic  function  ^(t)  =  (exp(i7^)).  Non-negativity  of  the  pdf 
implies  that  (])(/«)  is  real  and  non-increasing  for  real  u.  By  the  same  token,  ln(|)(0,  the  cumulant-generating 
function  of  is  convex  for  purely  imaginary  t.  Setting  ^  =  lnl/(jc+r)-/(j^)l  for  fixed  r  and  t  =ig\nr,  we  see 
from  Eq.  (16)  that  is  proportional  to  a  cumulant-generating  function,  hence  the  concavity  of  ^(^). 
^This  constitutes  an  internal  calibration  of  the  instrumentation  in  our  data  analysis  lab;  see  Appendix  for 
external  calibration  procedures  using  standard  input. 


115 


We  can  therefore  equate  //(I),  the  “mean  Holder  exponent,”  to  the  codimension  of  the 
graph  of  the  data.  This  quantity  has  natural  bounds.  If //(I)  =  0,  then  Dg  =  i/+l:  the 
graph  fills  a  finite  portion  of  91  If  /f(l)  =  1,  then  Dg  =  d:  the  graph  is  as  smooth  as 
the  support  of  x,  namely,  a  Euclidian  line  or  plane. 

43  Running  Means  and  Variances 

Figure  8a  demonstrates  the  practical  importance  of  the  integral  scale  R  — a  2-point 
statistical  construct —  for  a  1 -point  statistic  which  is  a  priori  simpler.  We  have  plotted 
\!ij{r,x)  vs.  r  for  14  different  non-overlapping  40  km  sections  from  the  LWC  database. 
Running  means  begin  to  stabilize  only  at  r  =  20  km,  some  to  values  quite  far  from  the 
ensemble-mean  of  0.29  g/cm^  (beyond  one  ensemble-a).  This  tells  us  two  things.  First, 
we  need  a  stretch  of  at  least  one  or  two  integral  correlation  lengths  before  we  can  even 
talk  about  a  mean,  even  locally.  Second,  this  data  is  highly  non-ergodic:  no  physically 
attainable  length  of  LWC  data  seems  to  be  enough  to  reach  a  “climatological”  average.' 

Figure  8^  shows  ay(r;x)  vs.  r  for  the  same  14  sections,  this  time  in  log-log  axes.  The 
striking  feature  is  the  jumpiness  of  running  variance  due  to  localized  events,  clearly  the 
intermittently  distributed  clusters  of  down-spikes  in  Figs.  2a,b,d,e;  in  one  case,  a  factor 
of  ~2  is  gained  even  after  accumulating  for  ~30  km.  Such  jumps  are  very  unlikely  in  a 
process  obeying  Gaussian  statistics.  The  r.m.s.  ensemble-average,  is  also 

plotted  (bold  dots):  for  r  between  0.15  and  20  km,  it  follows  a  power  law  in  a^-2.  The 
exponent  is  numerically  identical  to  ^(2)/2  =  (p-l)/2m,  not  a  coincidence:  (oy^(r;x)) 
should  be  of  the  same  order  of  magnitude  as  ([/(.x+r)-y(x:)]^>.  Here  again,  no  convergence 
to  the  estimated  ensemble-mean  a  (0.05  g/cm^)  is  in  sight,  even  at  r  =  40  km  >  R. 


Figure  8:  Running  Averages  for  ID  LWC  Data,  (a)  |x(r;x)  vs.  r  for  0  <  r  <  40  km  and  15  different  locations 
separated  by  40  km  or  more,  {b)  Same  as  {a)  but  for  CT(r,jc). 

*We  are  in  no  position  to  claim  that  0.29  g/cm^  is  climatologically  relevant,  not  even  for  marine  Sc  in 
summer  off  of  southern  California  (where  FIRE  was  conducted  in  June-July  1987).  Moreover,  even  for  a 
given  cloud-type,  location,  and  season,  we  have  5  means  (also  true  of  characteristics  described  below). 
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4.4  Multifractal  Analysis  2,  Singular  Measures 

Figure  9a  shows  the  spatial/ensemble-average ^  log2(e(r;jt)^>  vs.  log2r  for  integer- 
valued  q  =  and  log2(r/r|)  =  0,...,[mini  =  i,...,5log2(VTl)]  using  the  LWC  data  in  Eq. 

(2)  with  \\^Ai  (20  m)  and  then  Eq.  (15).  For  r  >  r|,  we  have  power  laws  for  all  ^’s; 
adopting  the  notation  of  Schertzer  and  Lovejoy  (1987),  we  posit: 

<£(ry:)9)  ~  (20) 

for  the  ^th-order  moments  of  the  coarse-grained  measures.  In  Fig.  9b  we  show  K{q) 
versus  ^  for  0  <  ^  <  5. 

An  often  used  alternative  to  the  <f-dimensional  coarse-grained  measure  e(r;j:)  is  the 
total  measure  in  the  interval  [jc,jc+r)  or  domain  [;r,x+r)®[x,;c+r): 

p(rpc)  =  7^e(r;j:).  (21a) 

In  this  case,  sums  — not  averages —  of  the  qih.  power  of  p{r\x)  over  the  (L2/r)^  disjoint 
intervals  are  used: 


X 


dXb) 


where  <•)  now  designates  an  (optional)  ensemble-average  over  different  reali:^ions  of  e. 
By  separating  spatial  and  ensemble  averages  in  (20)  (L2/r)^<e(rpf)^>  =  r'^^^{p{r^)‘^). 
Hence,  from  Eqs.  (20)  and  (21£>),  we  find  ^ 

%(q)  =  (q-l)d-K{q).  (22) 

Methods  for  estimating  t(^)  using  continuous  wavelet  transforms  have  been  developed  by 
Ameodo  etal.  (1988). 

Two  K(q)  values  are  predetermined.  Normalization  requires  ^r(O)  =  0  (t(0)  =  ~d). 
Only  at  ^  =  1  can  we  permute  the  spatial  averaging  inside  the  interval  {x,x+r)  and  the 
spatial  average  over  the  various  x’s.  In  the  convention  where  x  is  sampled  every  r,  there 
is  no  difference  in  the  outcome  as  r  is  varied:  ^(1)  =  0  ('C(l)  =  0). 

We  note  the  convexity  of  K{q)  in  Fig.  9b,  the  associated  x{q)  being  concave.  This 
remarkable  property  is  traceable  to  the  same  probabilistic  cause  as  for  ^(^)’s  concavity. 
Here  again  this  can  be  used  to  define  the  non-increasing  hierarchy  of  exponents: 


D{q)^ 


m 

q-\ 


(23) 


the  generalized  dimensions  introduced  originally  by  Grassberger  (1983)  and  Henchel  and 
Procaccia  (1983)  for  the  characterization  of  strange  attractors  in  deterninistic  chaos 
theory. 2  Some  dimensions  are  noteworthy;  in  standard  terminology,  we  have: 


Ht  is  customary  to  resample  x  every  r  (from  0  to  L2-r)  in  the  spatial  part  of  the  averaging  procedure;  this 
guarantees  that  no  data  is  used  more  than  once  for  a  given  statistic. 

that  context,  p(r,x)  is  the  number  of  points  sampled  in  the  phase  space  of  the  dynamical  system. 
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•  D(0+)  =  d+Kify^)  =  t(0+),  dimension  of  the  support; ^ 

•  D(l)  =  d-K\l)  =  t’(1),  information  dimension; 

•  D(2)  =  d~K(2)  =  x(2),  correlation  dimension. 

At  ^  =  2,  the  scaling  of  (e(;c+r)e(;c))  can  be  related  to  that  of  (e(rpc)2>  ~  at  least  for 

multiplicative  cascade  models  (see  Appendix,  §A.3.2).  From  the  power-law  translation 
of  the  Fourier  duality  in  Eq.  (9b),  we  have 

=  l-ii:(2)  =  l-[d-x(2)]  =  (1-d  >f  D(2).  (24) 

A  fourth  value  was  attracted  considerable  attention: 

•  ^(^d)  =  0  or  x(qD)  =  0  or  K(qj))  =  (qD-\)d  define  the  “critical”  moment  ^d- 

For  q  >  qD,  the  moments  of  £(r;x)  are  divergent  (Mandelbrot  1974,  Kahane  and  Peryiere 
1976,  Schertzer  and  Lovejoy  1987,  Gupta  and  Waymire  1993,  and  others). 

Data  like  ours,  yielding  a  nonlinear  K(q)  hence  a  non-constant  D(q),  are  “multifractal” 
in  the  sense  of  Halsey  et  al.  (1986),  Meneveau  and  Sreenivasan  (1987a, i)),  Schertzer  and 
Lovejoy  (1987),  Evertsz  and  Mandelbrot  (1992),  and  others.  If  D(q)  =  constant,  K(q)  and 
x(q)  are  proportional  to  q-\,  and  the  data  are  said  to  be  “monofractal.” 


K(q) 


4  6  8  10  12 

log  (r/l),  1=5  m 
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Figure  9:  Singular  Measures  for  ID  LWCData.  (a)  Scaling  for  q  =  1(1)5.  (b)  K(q)  exponents. 


4.5  Bi-Fractal  Analysis 

Is  it  strictly  necessary  to  have  two  kinds  of  multifractal  analyses?  In  other  words:  Is 
there  a  general  connection  between  tfq)  and  K(q)l  This  is  an  open  question  discussed  at 
length  in  the  specialized  literature  (Schertzer  and  Lovejoy  1987,  Meneveau  and 
Sreenivasan  1991,  Sreenivasan  1991,  Frisch  1991,  Davis  et  al.  1993,  Vainshtein  et  al 
1994),  the  consensus  being  that  at  least  one  extra  exponent  is  needed  to  go  from  the 


=  0+  means  that  e(rpc)  >  0  gives  1  and  z(r,x)  =  0  gives  0.  Note  that  Ar(0'*')  <  0,  due  to  convexity. 
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currently  fashionable  /ir(<y)-based  approach  to  the  more  traditional  one  based  on  structure 
functions  and  tjiq).  Let  this  exponent  be 

/fi=//(l)  =  ;(l);  (25a) 

Bertozzi  and  Chhabra  (1994)  show  that  k  =  l-H\  is  the  “cancellation”  exponent  recently 
introduced  by  Ott  et  al.  (1992)  to  measure  the  rate  at  which /(j:)  goes  from  an  up-trend  to 
a  down-trend.  We  prefer  to  view  the  (mean)  Holder  exponent  H\  as  an  index  of 
nonstationarity:  the  degree  of  nonstationarity  increases  with  H\,  ranging  from  0 
(stationarity)  to  1  (differentiability). ^ 

Another  question  of  practical  importance  is:  How  many  q’s  do  we  actually  need  in 
any  given  application?  It  is  impossible  to  answer  this  question  on  general  grounds. 
However,  we  consider  it  important  in  any  application  to  get  at  least  a  first-order  estimate 
of  the  degree  of  intermittency  in  the  system.  Using  the  small-scale  absolute  gradients  is  a 
convenient  way  of  doing  this  and  a  Ist-order  index  of  intermittency  is  the  information 
codimension 

Cl  =  d-D{\)  =  d-x\\)  =  /:'(!)•  i25b) 

At  Cl  =  0,  there  is  no  intermittency.  At  Ci  =  d,  intermittency  is  maximal,  corresponding 
to  a  situation  where  all  the  measure  is  concentrated  onto  a  finite  number  of  points  (cf. 
Dirac  measures  in  the  Appendix). 

Relations  of  the  type 

^(^)  =  q/a-K{q/b)  (26) 

have  been  proposed  where  a  and  b  are  constants.  For  instance,  a  =  =  3  for  turbulent 

signals  (e.g.,  Sreenivasan  1991)  and  a  <  /?  for  a  model  of  Schertzer  and  Lovejoy’s  (1987) 
described  in  the  Appendix.  The  generality  of  this  ^{q)  K{q)  connection  remains  an 
open  question.2  If  however  Eq.  (26)  is  either  true  in  general,  accepted  as  an 
approximation  for  low  enough  ^’s,  or  used  as  a  definition  of  K{-)  after  setting  b,  then  one 
can  derive  both  H\  and  Ci  from  structure  functions  alone,  without  resorting  to  measures 
based  on  gradients.  Indeed,  Eqs.  (25-26)  yield 


f//l  =  ;(l)  =  l/a-K{l/fe)  fff,  =  C(l)  =  1/<J 

\ci  =  K\i)  =  b/a-b>:;(b)  =  m-b<;{b)  ^  Ic,  = 


(27) 


Notice  that  if  t^{q)  q,  then  K{q)  =  0,  hence  Ci  =  0;  so  Ci  is  a  measure  of  the  curvature 
in  l^iq).  As  an  example,  we  obtain  Ci  «  0.06  in  this  manner  for  the  2D  Landsat  data  using 
=  1  for  simplicity  (cf.  Fig.  7). 

“Bi-fractal”  analysis  is  the  minimal  form  of  multifractal  analysis  based  solely  on  H\ 
and  Cl.  [Davis  and  Marshak  (1996)  discuss,  compare,  and  relate  other  choices  used  in 
the  literature.]  Figure  10  shows  a  schematic  plane  for  =  1.  In  our  experience 

with  marine  Sc,  bi-fractal  analysis  has  proven  very  useful,  leading  in  particular  to  new 


Hn  §5.1  we  will  show  that  processes  with  p  <  1  are  stationary;  those  with  1  <  P  =  2H{2)+  1  <  3  are 
nonstationary  with  stationary  increments;  those  with  P  >  3  are  almost  everywhere  differentiable  and  have 
nonstationary  increments.  Since  H{2)  <  /f(l)  =  we  have  0  <  //j  <  1. 
methodology  for  empirical  verification  is  suggested  by  Davis  et  al.  (1993). 


500 


Dirac  5~functions 


Heaviside  Steps  i  (1,1) 


Figure  10;  The  Bi-Fractal  Plane.  This  schematic  shows  how  the  vertical  coordinate  reflects  the 
intermittency  of  the  squared  small-scale  gradients  (lower  inset)  whereas  the  horizontal  coordinate 
characterizes  the  nonstationarity  of  the  primary  signal  (upper  inset).  Synthetic  data  with  increasing  Cj  and 
H\  grace  the  axes.  We  see  that,  contrary  to  spectral  analysis,  bi-fractal  analysis  distinguishes  Brownian 
motion,  (//i,Ci)  =  (1/2,0),  from  Heaviside  steps,  (//i,C])  =  (1,1),  which  are  both  nonstationary  (iTj  >  0). 
The  same  is  true  for  the  stationary  {Hi  =  0)  gradients  of  these  two  theoretical  cases:  white  noise  has 
{Hi,Ci)  =  (0,0),  and  a  single  spike,  modeled  by  a  5-function,  has  {Hi,Ci)  =  (0,1). 
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insight  into  cloud  dynamical  processes.  Table  2  gives  (//i,Ci)  coordinates  for  the  data 
presented  in  section  2. 

Table  2:  {H\,C\)  values  for  FIRE  Landsat  radiances  and  LWC,  ASTEX  and  SOCEX  LWC,  and  ARM  LW 
path  data. 


Database  (dataset) 

Hi 

Cl 

f  min  /  ^ max 

Reference 

FIRE  Landsat  radiances  (6/30) 

0.54 

0.06 

150  m  /  10  km 

(this  publication) 

FIRE  LWC  (6/30, 22:41Z) 

0.29 

0.14 

20  m  /  20  km 

Marshak  eta/.  (1996) 

HRE  LWC  (7/02, 02:23Z) 

0.22 

0.15 

”  ” 

>>  H 

FIRE  LWC  (7/14, 23:09Z) 

0.34 

0.03 

”  ” 

>>  f) 

FIRE  LWC  (7/16, 17:17Z) 

0.31 

0.08 

9>  yy  yy 

FIRE  LWC  (7/16,  18;19Z) 

0.34 

0.07 

>>  yy  »» 

FIRE  LWC  1987  ensemble 

0.28 

0.10 

yy  n  »» 

ASTEX  LWC  1992  ensemble 

0.29 

0.08 

60  m  /  60  km 

Davis  etal.  (1994a) 

SOCEX  LWC  1994  ensemble 

0.28 

0.09 

5  m/5  km 

(in  preparation) 

ARM  LW  path  ensemble 

0.37 

0.08 

1  min  /  8  hr 

Wiscombe etal.  (1994) 

The  scatter,  most  notable  in  the  intermittency  index,  reflects  the  diversity  in  Figs.  2a- 
e  and  argues  for  a  non-ergodic  model  for  this  data.  FIRE  was  conducted  off  the  coast  of 
Southern  California,  near  San  Diego.  We  have  added  to  Table  2  entries  for 

ensemble  averages  from  analyses  of  LWC  in  marine  Sc  for  two  other  field  programs:  the 
Atlantic  Stratocumulus  Transition  Experiment  (ASTEX)  that  focused  on  a  more  complex 
situation  with  transitions  to  cumulus  regimes,  and  the  Southern  Ocean  Coupled 
Experiment  (SOCEX)  that  was  conducted  off  the  coast  of  Australia.  The  proximity  of 
the  ensemble  average  points  for  the  three  different  local  climates  argues  for  a 

degree  of  universality  in  the  dynamics  that  determine  the  internal  structure  of  marine  Sc. 
The  H\'s  for  LWC  cluster  near  1/3,  the  value  that  characterizes  turbulent  fields:  velocity 
(Kolmogorov  1941),  temperature,  and  passive  admixtures  (Obukhov  1949,  Corrsin  1951). 
The  Cl’s  for  LWC  cluster  near  0.1,  precisely  in  the  range  observed  for  turbulent 
signals.  Thus,  although  H2O  (in  all  its  phases)  is  far  from  being  a  dynamically  passive 
constituent  of  the  atmosphere,  it  can  be  perceived  as  advected  by  the  turbulent  wind  field 
to  a  first  approximation. 

Finally,  we  have  appended  to  the  above  LWC  statistics  an  ensemble-average 
for  liquid  water  path  (column  integrated  LWC)  measured  at  the  ARM  site  in  Lamont 
(Ok.)  for  arbitrary  cloud  cover,  as  opposed  to  Sc  only.  We  find  roughly  the  same  Ci  but 
a  somewhat  larger  Hi  than  for  the  three  LWC  observations.  This  is  not  surprising:  the 
vertical  integration  that  relates  LWC  to  LW  path  will  produce  a  smoother  signal.  LW 
path  is  more  directly  relevant  than  LWC  to  the  radiative  properties  of  the  cloud,  as 


^  Cl  values  can  be  obtained  from  “intermittency  parameters,”  K(2),  gleaned  in  the  literature.  To  go  from 
the  characterization  at  9  =  2  to  ^  =  1,  two  extreme  hypotheses  are  log-normality,  K(q)  =  Ciq(g-l)y  and 
monoscaling,  K(q)  =  Ci(^-l).  Allowing  for  this  uncertainty  (Ci/A'(2)  =  1-2),  soCi  falls  in  0.2-0.3. 

^In  turbulence  studies,  the  dissipation  rate  field  is  obtained  by  squaring  the  velocity  gradients  at  the 
Kolmogorov  scale,  leading  to  K2{q),  rather  than  taking  their  absolute  values,  leading  to  K^iq),  a  difference 
in  methodology  easily  accounted  for:  Lavall6e  et  al.  (1993)  show  that  Ki(q)  =  K2(q/2)-qK2{\/2)  which 
puts  Cl  =  Ki’(l)  in  the  range  0.07-0.15. 


121 


observed  in  the  Landsat  image.  The  Hi  for  marine  Sc  radiance  fields  in  visible  channels 
is  even  smoother  than  for  LW  path  because  the  escaping  radiation  fields  are  highly 
scattered.  Davis  et  al  {\996b)  show  that  multiple  scattering  leads  to  a  non-trivial 
physical  smoothing  over  scales  «200-300  m,  and  Marshak  et  al.  {\995b)  use  multifractal 
analysis  to  show  that  this  smoothing  affects  large  and  small  jumps  in  the  horizontal 
distribution  of  LW  path  differently. 


5,  Semi-Empirical  Criteria  (Statistical  Interpretation  of  Scaling  Regimes  and 
Exponents) 

In  this  section,  we  state  or  establish  some  results,  typically  inequalities  between 
exponents,  that  enable  us  to  classify  data  as  stationary  or  not,  ergodic  or  not,  intermittent 
or  not,  according  to  the  scaling  behavior  of  various  statistics.  The  same  data  can  have 
conflicting  attributes  (e.g.,  stationary  and  nonstationary)  as  long  as  they  refer  to  different 
ranges  of  scales.  From  a  statistical  standpoint,  it  will  become  apparent  that  stationarity, 
ergodicity,  and  intermittency  are  just  different  facets  of  the  basic  issue  of  data  analysis; 
What  properties  should  we  determine  from  our  data?  and  How  accurately  can  we 
estimate  them  from  our  finite  sample?  From  a  physical  standpoint,  stationarity  and 
intermittency  are  clearly  more  fundamental  concepts  than  ergodicity. 

5.1  Criterion  for  Stationarity  (A  Necessary  Condition  for  Ergodicity) 

In  the  Appendix,  we  describe  procedures  for  synthesizing  a  number  of  scale-invariant 
models  for  stochastic  processes  and  discuss  their  properties,  among  these  “stationarity.” 
This  body  of  theoretical  knowledge  is  important  for  a  variety  of  reasons.  First,  models 
enable  validation  of  analysis  software  — “instrumental  calibration”  in  our  laboratory 
analogy —  as  well  as  sensitivity  studies  (how  does  the  output  depend  on  the  amount  and 
properties  of  the  data  being  processed?).  Second,  models  can  be  used  in  applications 
(e.g.,  cloud  radiation  studies).  Last  but  not  least,  models  generally  have  well-understood 
properties.  Strictly  speaking,  stationarity  is  a  property  that  can  only  be  assigned  to  a 
model  because  the  question  is:  Are  statistical  quantities,  as  defined  by  ensemble 
averaging  (i.e.,  over  probability-  or  f- space),  invariant  under  translation  in  x?  This  is 
not  easy  to  address  with  data  because  we  always  operate  with  finite  amounts  of  data; 
furthermore  we  rely  heavily  on  spatial  averages  to  estimate  statistics  in  the  first  place,  so 
at  least  some  x-dependence  is  operationally  erased. 

Another  theoretical  question,  more  directly  relevant  to  data  analysis,  is  that  of 
“ergodicity:”  Do  spatial  averages  of  increasing  length  for  a  single  realization  converge 
to  ensemble  averages  (over  all  possible  realizations)?  In  data  analysis  we  generally 
make  implicit  ergodicity  assumptions  before  computing  spatial  averages,  i.e.,  we  expect 
them  to  converge  to  something  meaningful.  We  also  assume  implicitly  that  this  definite 
number  we  are  seeking  does  not  depend  on  when  we  start  computing  it  (stationarity).  It 
is  therefore  important  to  have  guidelines  as  to  what  quantities  are  statistically  well- 
defined,  not  just  computable  by  some  given  algorithm. 
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Without  exception,  the  scale-invariant  models  in  the  Appendix  with 

p  <  1  (28a) 

are  stationary  in  the  “broad”  sense  where  G(r)  =  {\fix+r)-<f)]\fix)-{f)])  depends  only  on  n 
we  recall  that,  in  this  theoretical  context,  the  2nd-order  autocorrelation  function  is 
obtained  by  averaging  over  all  possible  (or  at  least  many)/’s,  holding  x  constant.  Davis 
et  al.  (1994/7,  1996a)  give  more  general  arguments  for  drawing  the  line  between 
stationary  and  nonstationary  behavior  at  P  =  1  for  scale-invariant  processes. 

If  we  only  have 

P  <  3  (2m 

then  the  model  has  (broad-sense)  stationary  increments:  {{f(x+r)f(xy)  =  G(r)4</)^,  and 
even  (f{x)),  may  or  may  not  depend  on  x  but  the  structure  function  of  order  q  =  2,  namely 
<[/(x+r)-y(x)]^),  does  not.  We  propose  to  use  the  criteria  in  Eqs.  (2Sa,b)  for  real  world 
data-streams  as  well  as  for  theoretical  models.  For  data,  p  >  1  means  that  spatial 
estimates  of  <J{x+r)f(x))  are  likely  to  vary  from  one  realization  (or  portion  of  data)  to  the 
next.  For  data,  p  <  3  means  that  spatial  estimates  of  {\f(x+r)-f(x)f)  are  likely  to  be 
robust  (i.e.,  invariant  under  addition  of  new  data). 

In  this  scheme,  the  FIRE  LWC  data  (P  «  1.4)  in  Figs.  2a-^  is  nonstationary  with 
stationary  increments  for  scales  from  ~20  m  to  the  integral  scale  »20-40  km.  For  larger 
scales,  we  have  no  spectral  information  in  Fig.  4c  but  the  leveling-off  of  the  structure 
functions  in  Fig.  6a  confirms  this  estimate  of  the  integral  scale.  This  is  symptomatic  of 
stationarity  (increments  cease  to  grow).  The  FIRE  Landsat  data  (p  ~  2.0)  in  Fig.  3  is  also 
nonstationary  with  stationary  increments  from  the  radiative  smoothing  scale  ==200  m  to 
=  10  km.  Were  it  not  for  saturation  at  maximal  gray  level  255,  spectral  flattening 
(transition  to  stationarity)  would  occur  closer  to  the  integral  scale  for  the  cloud  LW  path 
fluctuations,  itself  likely  to  be  around  that  of  LWC. 

In  contrast,  the  e(iT,x)  fields  in  Figs.  2a’-e*  (P  »  0.7)  are  stationary  in  spite  of  their 
intense  spikiness.  This  may  seem  surprising  since  local  means,  £(r;x)  with  r  »  r|,  will 
fluctuate  wildly,  depending  on  the  strength  and  number  of  spikes  that  fall  in  the  interval 
[x^+r).  The  strong  variability  of  these  fields  therefore  contrasts  with  the  conventional 
wisdom  about  stationarity,  essentially  that  (in  the  usage  of  time-series  analysis) 

‘temporal  statistics  do  depend  little  on  when  they  are  gathered. '  (*) 

The  occurrence  of  spikes  of  course  perturbs  strongly  the  local  statistics  and  therefore 
violates  this  operational  definition  of  stationarity.  We  prefer  to  think  of  property  (*)  as  a 
consequence  of  ergodicity  (which  is  more  restrictive  than  stationarity):  if,  in  general, 
mnning  temporal  averages  converge  reasonably  fast  to  their  ensemble  counterparts  as  the 
sample  size  increases,  then  clearly  we  need  not  worry  about  where  we  start  cumulating. 
The  two  unstated  assumptions  in  effect  here  are: 

1)  1 -point  statistics  are  Gaussian-type,  i.e.,  that  only  relatively  small  deviations  from 
mean  or  modal  values  are  anticipated; 

2)  2-point  correlations  are  of  short  range,  i.e.,  the  integral  scale  is  relatively  small. 
Samples  of  relatively  small  length  therefore  provide  enough  data  to  obtain  accurate 
estimates  of  1 -point  and  2-point  statistics  of  all  orders. 
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In  our  outlook,  stationarity  should  have  no  bearing  at  all  on  the  intensity  of  the 
fluctuations;  it  should  however  have  a  strong  impact  on  their  “rapidity”  (since  the 
smaller  the  spectral  exponent  the  more  variance  in  the  small  scales).  In  short,  we  propose 
to  drop  the  above  assumption  #1.  Even  in  strongly  intermittent  cascade  processes  e(x) 
with  <e)  =  1,  spikes  (where  £(x)  »  1)  do  not  prevent  the  signal  from 

'having  a  well-defined  mode,  ’  (f) 

(which  is  naturally  <t:  1).  Property  (f)  is,  in  our  opinion,  a  better  description  of  a 
stationary  time-series,  whether  Gaussian/ergodic  or  not.^ 

5.2  The  Onset  of  Sampling  Problems  (Trivial  Ergodicity  Violation  by  Finite  Datasets) 

In  the  above  considerations,  only  2nd-order  statistics  are  used,  and  we  assume  that  the 
data  is  ergodic  for  that  type  of  statistic  (dominated  by  relatively  frequent  events).  Going 
to  ever  higher-order  moments,  extreme  events  gain  more  and  more  weight.  In  estimators 
based  on  a  finite  amount  of  data,  a  single  event  will  eventually  dominate.  This  can  be 
sensed  in  scaling  analyses  when  the  exponent  functions,  (fq)  or  K{q),  become  linear  in  q. 
This  is  a  gradual  transition  but  one  can  nevertheless  define  qs  such  that 

(f(q)  or  K’{q)  «  constant  for  q  >  q^. 

Schertzer  and  Lovejoy  (1992)  derive  an  expression  for  in  the  frame  of  singular 
measures.  Now  some  singular  cascade  models  for  £(rpc)  (e.g.,  “p-models”  discussed  in 
the  Appendix)  are  immune  to  sampling  problems  yet  their  ^(^)’s  become  asymptotically 
linear  in  q,  simply  because  there  is  a  well-defined  maximal  event  present  in  every 
realization.  In  practice,  one  can  easily  test  the  hypothesis  of  obtaining  roughly  the  same 
maxx{e(r;x)}  for  every  realization.  So,  in  principle,  there  is  no  risk  of  misinterpreting  the 
observation  of  linear  trends  in  K(q),  at  least  for  models. 

Using  the  FIRE  LWC  data.  Figs.  6b  for  (^(q)  and  9b  for  K{q)  are  quasi-linear  for 
q>3,  which  is  our  estimate  for  qs-  The  l^(q)  in  Fig.  lb  for  the  FIRE  radiance  data  is 
quite  linear  for  small  q"s  (see  below)  but  does  not  show  an  asymptote  for  large  q. 

5.3  Criterion  for  Intermittency  (Non-Trivial  Ergodicity  Violation  by  Non-Gaussian 
Processes) 

Suppose  we  want  to  model  data  (or  the  geophysical  field  it  samples)  with  the  simplest 
possible  scale-invariant  stochastic  process.  A  question  naturally  arises:  Ts  a  non- 
Gaussian  multifractal  model  strictly  necessary  or  is  a  simpler  monofractal  one  good 
enough?'  Focusing  on  gth-order  structure  functions  in  Eq.  (16),  we  find  mono-  (or 
“simple”)  scaling  — namely,  l^{q)  =  ^(1)^ —  as  soon  as  the  trivial  dimensional  reasoning, 
that  is 

{\f{x^ryf{x)\^)  -  ()f{x^r)-f{x)\f,  (29a) 


^  A  random  sample  is,  by  definition,  dominated  by  modal  values  and  will  generally  not  give  good  estimates 
of  high-order  statistics,  even  of  the  mean  if  the  distribution  is  sufficiently  broad  or  skewed. 
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makes  accurate  predictions.  Similarly,  for  singular  measures,  we  would  have 

<£(r;x)^>  «  (einx))^,  {29b) 

hence  K{q)  »  Ki\)q  =  0.  We  have  argued  above  that,  for  all  practical  purposes  (i.e.,  a 
finite  amount  of  data  is  available),  Gaussian-type  increments  f{x+r)-J{x)  or  e(x)  fields  will 
indeed  yield  good  estimates  of  relatively  high-order  moments  that  obey  {29a,b).  This 
type  of  data  can  be  deemed  “ergodic”  and  is  necessarily  non-intermittent  in  the  sense  that 
K{q)  is  vanishingly  small.  It  is  therefore  important  to  give  a  quantitative  meaning  to  the 
symbols  in  Eqs.  {29a,b)‘7 

In  theory,  fractal  and  multifractal  properties  require  a  small-scale  limit  (r->0)  to 
become  mathematically  precise.  In  practice,  multifractal  analyses  are  performed  with 
finite  amounts  of  data  with  a  finite  range  of  scales.  There  will  almost’  always  be  trivial 
ergodicity  violations  based  on  extreme  events,  as  described  briefly  in  the  previous 
subsection.  In  turn,  these  events  lead  to  a  small  but  finite  degree  of  multiscaling.  Can  we 
distinguish  between  this  spurious  multifractality  and  its  ''real”  counterpart  (i.e.,  that  is 
likely  to  be  robust  under  addition  of  new  data)  ? 

This  exercise  in  ergodicity  verification  normally  requires  either  obtaining  more  data 
or  subdividing  the  available  data  and  re-doing  the  analyses  in  order  to  monitor  the  effect 
of  sample-size.  Aurell  et  al.  (1992),  Marshak  et  al.  (1994),  Eneva  (1994),  and  others 
have  explored  this  sampling  issue  analytically  or  numerically  with  specific  models  that 
yield  K{q)  =  0  in  the  small-scale  limit  but  K{q)  i  0  for  a  finite  range  of  scales.  Grivet- 
Talocia  (1995)  and  others  have  investigated  finite-size  effects  that  cause  stationary 
scaling  processes  with  (3  <  1  to  have  small  but  finite  ^(1)  and  ^(2),  in  contradiction  with 
Eq.  (18),  let  alone  ^{q)=  0.  A  general  (model-independent)  first-order  answer  to  the 
above  question  is  now  derived  from  the  hypotheses  in  Eqs.  {21  a, b). 

Let  '^{r\x)  =  \f{x+r)-fix)\  or  e(r;x),  depending  on  the  multifractal  approach  of  interest. 
Equations  {29a,b)  become 

=  <|(r,A:)9ref>"«ref.  (30) 

For  specificity,  we  can  take  ^ref  =  1  but  in  some  applications  ^ref  =  2  would  be  a  natural 
choice.  In  the  following,  it  is  assumed  that  some  range  ^max]  is  explored  for  a 
number  of  scales,  ranging  from  rmin  to  ^max-  To  make  the  sign  in  Eqs.  (29-30) 
quantitatively  meaningful,  we  just  need  to  make  sure  that  there  is  minimal  information  in 
the  prefactors  on  ^-dependence.  There  often  is  (cf  Figs.  6a  and  7a),  but  of  a  trivial  kind, 
just  reflecting  a  poor  choice  of  units.  It  is  easy  to  select  physical  units  for  ^  that  remove 
this  trivial  dependence  for  singular  measures:  by  normalizing  £(r;jr)  so  that  the 
r-independent  statistic  <£(r;x)>  is  unity  (cf  Fig.  9a).  For  structure  functions,  there  is  no 
r-independent  case  but  we  can  always  choose  units  that  make  (!/(j:+rmax)-y(-^)l^ref>  =  1 ; 
then,  if  there  is  little  statistical  information  on  q  in  the  prefactors,  the  same  should  be 
approximately  true  in  general:  <!/‘(jc+rmax>-/(^)l^>  ==  1-  Visual  inspection  of  log-log  plots 
of  (^(rpc)^)  vs.  r  is  enough  to  show  that  this  is  possible:  the  intercepts  of  regression  lines 
should  be  approximately  linear  in  q. 


’  We  exclude  models  that  are  cunningly  ergodic  in  the  sense  of  one  or  the  other  of  the  multiscaling  statistics 
(cf.  “p-model”  in  the  Appendix  for  singularity  analysis). 
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Using  the  definitions  in  Eqs.  (16)  or  (20),  we  can  rewrite  the  constraint  for 
monoscaling  in  Eq.  (30)  as  (r/rmax)^^^^  “  (^/''max)^^^ref^  where 


-  I  -K{q)lq  =  X{q)lq+{\lq-\)d  =  {\lq-\)[d~D{q)]  • 


(31) 


In  the  (lower)  singular  measures  case,  A{q)  is  a  new  non-increasing  function. 

Now  we  must  decide  at  what  degree  of  discrepancy  in  Eq.  (30)  we  are  compelled  to 
take  multiscaling  into  account.  There  is  no  universal  answer  for  this  at  present;  in 
particular,  the  number  of  samples  will  be  a  factor.  However,  there  is  a  general  consensus 
that  multifractals  are  a  framework  for  modeling  “strong”  variability.  It  seems  reasonable 
to  require  that  the  ratio  between  both  sides  of  Eq.  (30)  be  less  than,  say,  one  order-of- 
magnitude  or  more  at  the  opposite  end  of  the  range  of  scales  from  where  their 
quantitative  agreement  can  be  imposed  arbitrarily,  simply  by  using  the  proper  units.  We 
did  this  at  r  =  rmax,  so  we  will  now  focus  on  r  =  rmin-  Conversely,  we  can  require  that 
bone  fide  multiscaling  data  obey 


where  B  is  an  arbitrary  but  relatively  “big”  number,  like  10  or  100  (depending  on  the 
application,  amount  of  data,  etc.).  Taking  logs  in  Eq.  (32)  yields 


log(^)  A(9)l«''f  >  logs.  (33) 

The  chance  of  passing  this  test  is  clearly  greater  if  (0  the  range  is  scales  rmax/'min  is 
increased,  or  (»)  the  range  of  ^’s  is  increased. 

We  now  restrict  our  attention  to  singular  measures  with  which  we  characterize 
intermittency  routinely.  The  criterion  in  Eq.  (33)  can  be  translated  into  another  for  Ci  if 
we  make  an  assumption  on  A{q),  hence  K{q).  For  small  enough  values  of  q,  the  log¬ 
normal  model,  K{q)IC\  =  q^-q  (cf.  Appendix),  is  a  reasonable  fit  to  our  generally 
parabola-shaped  K{q)  curve  (e.g..  Fig.  9b)\  we  can  therefore  take  A{q)IC\  =  \-q. 
Combining  this  simple  expression  with  the  inequality  in  (33)  and  maximizing  the  range 
of  q  (i.e.,  q  =  ^max,  ^ref  =  ^min),  this  A{q)  function  yields  a  simple  — erring  somewhat  on 
the  conservative  side —  criterion  where  we  can  isolate  C\  as  the  only  data-dependent 
quantity,  the  others  being  essentially  instrumental:  the  data  is  “truly”  multifractal  only  if 

logfi/log(r max/ ^ min) 

S  9n.ax-.min  ' 

We  have  rmax/^min  ~  10^  for  most  of  the  entries  in  Table  2.  The  denominator  in  the  r.h.s. 
is  5;  taking  B  ~  10,  we  require  C\  >  0.07  to  qualify  the  data  as  truly  multiscaling.  One 
case  (FIRE  LWC  on  7/14)  fails  the  multifractality  test;  being  so  exceptionally  smooth, 
this  is  not  a  surprise.  The  others  cases  pass,  although  one  just  border-line. 

We  noted  already  that  the  t^{q)  results  in  Fig.  lb  for  the  FIRE  radiance  fields  are 
almost  linear  in  q  for  small  enough  q  (say,  0  to  2.5).  Can  we  argue  for  multifractality 
with  this  data?  In  this  case,  we  have  rmax/^min  *  10^  (this  is  a  somewhat  conservative 
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estimate  because  of  the  large-scale  effect  of  radiometry  saturation).  Values  of  A{q)  for 
^ref  =  1  and  ^max  =  5  are  A(l)  =  ^(1)  =  Hx  »  0.5  and  A(5)  =  C(5)/5  =  2.0/5.  Using  base  10 
logarithms,  the  l.h.s.  of  Eq.  (33)  is  therefore  »2+x(0.5-0.4)  =  0.2+,  much  less  than  the 
r.h.s.  for  B  =  10.  This  argues  for  a  monofractal  model  for  this  data.  Surprising  result 
since  we  have  good  evidence  of  multifractality  in  the  associated  LWC  fields  which  are 
representative  of  the  fluctuations  of  the  extinction  (photon  scattering  probability  per  unit 
of  length)  in  the  cloud.  Numerical  simulations  by  Marshak  et  al.  {\995b)  explain  this 
paradox:  multiple  scattering  processes  smooth  the  large  jumps  in  extinction  more 
effectively  than  the  small  ones;  the  latter  determine  Hx  and  the  former,  the  higher-order 
moments  of  the  increments. 


6.  THEORETICAL  CONSIDERATIONS  (Geophysical  Data  Analysis  as  a  Problem  in 
Statistical  Physics) 

Experimentation  is  generally  conducted  in  the  laboratory  with  a  theoretical  model  in 
mind,  a  hypothesis  to  test.  At  the  level  of  generality  that  we  have  adopted  in  our  survey 
of  multifractal  data  analysis  techniques,  the  most  pressing  theoretical  questions  are;  ‘Why 
is  scaling  almost  universally  observed  in  geophysical  signals  and  fields? and  ‘What  can 
we  learn  about  the  underlying  physical  processes  from  the  scaling  properties?’. 

6.1  Thermodynamical  Interpretation  of  Multifractal  Quantities 

In  §3.3  we  likened  the  statistical  parameter  q  with  a  standard  one  in  experimental 
work,  namely  ambient  temperature.  In  the  same  vein,  there  is  an  increasingly  popular 
interpretation  of  all  multifractal  exponents  as  thermodynamical  quantities,  first  explored 
by  Feigenbaum  (1987)  and  recently  surveyed  by  Muzy  et  al.  (1994).  In  particular, 
diverging  moments  (important  for  modeling  ergodicity/sampling  problems)  are  perceived 
as  1st-  or  2nd-order  phases  transitions,  their  signature  being  a  discontinuity  in  the  1st-  or 
2nd-order  derivatives  of  the  “equilibrium  curves”  l^(q)  or  Kiqf,  see  Schertzer  and 
Lovejoy  (1992)  for  a  discussion  in  the  frame  of  singular  measures. 

There  are  solid  physical  reasons  for  exploiting  this  formal  analogy.  Indeed,  the 
current  rationale  for  using  scaling  analysis  is  that  geophysical  systems  by  nature  have 
very  many  interacting  degrees-of-freedom.  We  can  think  of  the  number  of  computational 
cells  required  to  solve  the  coupled  PDEs  for  Navier-Stokes  equations,  generally  with 
ancillary  constraints,  for  very  high  Reynolds  numbers.  In  this  respect,  we  imagine  the 
generally  large,  complex  geophysical  system  under  consideration  as  a  thermodynamical 
one:  it  has  many  allowable  configurations  in  the  sense  that  all  the  microscopic  variables 
can  each  take  on  a  number  of  values.  Two  classic  examples  are:  positions,  linear-  and 
angular  momenta  of  molecules  in  a  gas;  spin  values  of  atoms  or  domains  in  a  magnetic 
material.  This  defines  a  vast  probability  space,  impossible  to  describe  in  any  kind  of 
detail.  However,  the  macroscopically  observable  quantities  are  few  (akin  to  temperature, 
free  energy,  entropy,  magnetic  induction,  etc.)  and  are  defined  by  ensemble-averages 
over  all  possible  microscopic  configurations.  These  “observables”  generally  depend  little 
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on  the  detailed  dynamics  of  the  system  and  thus  define  universality  classes  (e.g.,  real 
gases,  Ising  models).  The  counterpart  of  the  thermodynamical  limit  in  statistical  physics 
(i.e.,  very  many  interacting  particles)  here  is  the  limit  of  a  huge  computational  grid  (hence 
a  large  range  of  scales)  and  we  again  expect  some  kind  of  universal  behavior  to  arise. 

6.2  Information  Created  by  Breaking  the  Scaling  Symmetry 

Scaling  can  be  viewed  as  a  symmetry  (or  invariance)  obeyed  by  the  macroscopic 
geophysical  system  probed  during  data  collection.  In  this  case,  we  are  dealing  with  an 
invariance  under  change  of  scale.  There  are  other  possible  symmetries:  “stationarity” 
(§5.1),  invariance  under  translation;  and  “isotropy”  (§3.1),  invariance  under  rotation  in 
2D,  time  reversal  or  parity  (jc  — >  -x)  in  ID.  Of  course  these  symmetries  are  all  of  a 
statistical  nature  since  exactly  translationally  and  rotationally  symmetric  fields  are 
constant.  Generally  speaking,  the  more  symmetric  the  system,  the  less  information  is 
required  to  describe  it.  In  our  case,  similar  statistical  properties  for  a  whole  range  of 
scales  are  described  by  a  single  exponent  (and  a  prefactor).  It  can  be  argued  that 
information  about  any  system  can  be  gained  only  by  breaking  its  symmetry.  For 
instance,  to  measure  the  circumference  of  a  circle,  a  mark  must  be  placed  somewhere 
(thus  breaking  its  rotational  symmetry).  The  first  and  last  points  in  a  time-series  are 
special  with  respect  to  translations  (a  degree  of  nonstationarity  is  therefore  introduced). 

Our  experimentation  with  cloud  data  has  confirmed  this.  In  discussing  Table  2,  we 
highlighted  the  similarity  of  ^  =  1  multifractal  exponents  for  the  same  type  of  cloud 
(marine  Sc)  at  three  different  locales  (FIRE  data  from  the  N-E  Pacific,  ASTEX  data  from 
the  Mid-Atlantic,  SOCEX  data  from  the  S-W  Pacific).  So  a  “bi-fractal”  characterization 
tells  us  nothing  about  the  local  climatology;  it  does  tell  us  however  something  about  the 
universality  of  the  thermo-hydrodynamical  processes  that  shape  the  internal  structure  of 
marine  Sc  layers.  In  contrast,  the  scaling  range,  which  is  defined  by  scale-breaks  at  either 
end,  does  vary;  it  appears  to  be  roughly  in  proportion  with  the  thickness  of  the  boundary 
layer  (Davis  et  al  1996a,  Marshak  et  al.  1996). 

From  the  cloud  radiative  perspective,  the  most  interesting  scale-break  is  the  one  at 
=200  m  in  Figs.  5  and  la  relating  to  the  reflected  radiance  fields  of  marine  Sc.  Indeed, 
there  is  no  counterpart  of  this  statistically  robust  feature  in  the  LWC  data  measured  inside 
the  same  type  of  cloud.  Davis  et  al  (1996^)  survey  the  literature  on  this  scale-break  in 
the  energy  spectrum  and  they  describe  the  radiative  smoothing  mechanism  that  produces 
it,  via  multiple  scattering.  Marshak  et  al.  (1995Z?)  use  multifractal  methods  to  investigate 
its  implications  for  cloud  remote  sensing.  In  both  studies,  the  phenomenology  of  the 
Landsat  scale-break  is  based  on  a  numerical  Green’s  function  analysis  of  horizontal 
photon  transport,  uncovering  its  dependence  on  geometrical  cloud  thickness  and  photon 
mean  free  path  (corrected  for  the  forward  scattering):  the  scale-break  occurs  at  the 
harmonic  mean  of  these  two  fundamental  scales.^  Information  about  inherent  cloud 
properties  can  therefore  be  extracted  from  the  observation  of  the  scale-break.  Moreover, 


1  Interestingly,  this  finding  is  traceable  to  the  effect  of  the  non-illuminated  cloud  boundary  at  finite  range  on 
the  photons’  random  walks  that  are  modeled  by  nonstationary  Brownian  motion  (cf.  Appendix). 
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it  is  hoped  that  in  the  near  future  laser  and  low-light  detector  technologies  will  be 
combined  to  observe  directly  cloud  radiative  Green’s  functions.  This  would  enable 
robust  and  inexpensive  estimations  of  cloud  thickness  and  density,  both  important 
quantities  for  balancing  the  Earth’s  radiative  budget,  hence  forecasting  climate  change. 


7.  Summary 

We  have  presented  a  conceptual  model  for  geophysical  data  analysis  based  on 
laboratory  work.  The  “sample”  being  probed  is  the  data,  generally  collected  in  the  field: 
either  time-series,  ID  transects,  or  2D  images.  The  laboratory  “instruments”  are 
computer  codes  that  process  this  data  and  output  2-  or  more-point  statistics,  our  focus 
being  on  scale-dependent  statistical  quantities  that  convey  information  about  spatial 
correlations.  The  “readings”  of  these  instruments  are  analyzed  on  \og[statistic\  vs. 
\Qg[scale]  plots,  seeking  straight  lines  that  are  the  signature  of  scaling  (power-law) 
regimes.  A  number  of  remarkable  exponents  (log-log  slopes)  are  discussed  and  several 
criteria  are  presented:  When  is  the  datastream  stationary?  ...  When  does  it  have 
stationary  increments?  ...  Do  we  suffer  from  sampling  (*‘ergodicity”)  problems?  ...Is 
there  enough  intermittency  to  call  for  an  inherently  non-Gaussian  “multifractal"  model, 
as  opposed  to  a  simpler  Gaussian  one  with  “monofractal”  statistics?  Finally,  laboratory 
exercises  are  almost  universally  designed  to  validate  or  challenge  the  prevailing  theory 
about  the  structure  and  dynamics  of  the  sample.  Experimental  results  can  therefore 
prompt  new  theory.  We  show  how  this  proves  true  in  both  cases  for  the  theory  of  cloud- 
radiation  interaction,  an  important  pre-requisite  in  climate  theory  and  the  remote  sensing 
of  cloud  properties.  We  therefore  conclude  that  multifractal  scale-by-scale  analysis  is  a 
powerful  — yet  currently  underexploited —  tool  in  geophysical  research,  well-suited  for 
connecting  theory  and  measurements  in  a  broad  range  of  applications. 


Appendix.  SIMULATION  AND  CALIBRATION  (Scale-Invariant  Models: 

Stationary  or  Not,  Intermittent  or  Not,  Ergodic  or  Not) 

An  important,  quasi-universal  application  of  statistical  data  analysis,  multifractal  or 
other,  is  to  constrain  models  that  attempt  to  reproduce  the  data.  So  stochastic  simulation 
tools  are  often  developed  in  parallel  with  data  analysis  methods.  This  activity  has  both 
analytical  and  computational  aspects.  On  the  one  hand,  we  need  to  write  code  that 
implement  specific  algorithms  for  generating  random  functions  of  one  or  two  variables. 
On  the  other  hand,  we  need  to  know,  preferably  in  closed-form,  the  dependence  of  the 
statistical  quantities  of  interest  on  the  parameters  of  the  model.  In  our  experience, 
applications  are  two-fold: 

•  Stochastic  cloud  models  have  proved  invaluable  for  investigating  cloud  radiation 
properties  (e.g.,  reflectance,  transmittance,  and  absorption  of  solar-  and  laser  beams). 

•  Validation  of  data  analysis  algorithms:  before  applying  them  to  real  data,  it  is  crucial 
to  see  how  they  respond  to  artificial  data. 
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In  the  “laboratory”  analogy  for  data  analysis  used  in  this  paper,  the  “instrumentation”  is  a 
collection  of  computer  programs  that  process  the  data-stream  into  statistics  (large  input, 
small  output).  Others  will  help  fit  nonlinear  (e.g,,  power-law)  models  to  the  statistics  in  a 
more  or  less  supervised  manner.  Just  like  real  experimental  procedures,  these  instruments 
must  be  calibrated  with  standard  input,  in  our  case,  samples  of  data  with  known  and 
controllable  statistical  properties. 

In  this  Appendix,  we  present  a  comprehensive  collection  of  theoretical  models  to 
perform  this  task.  Simultaneously,  the  somewhat  abstract  concepts  of  stationarity  and 
intermittency  are  made  more  palatable.  On  the  one  hand,  we  need  both  mono-  and 
multiscaling  nonstationary  random  functions  with  stationary  increments  to  calibrate 
structure-function  analysis.  In  section  A.l,  we  recall  how  the  simplest  nonstationary 
processes  are  obtained  from  white  noises  in  both  ergodic  and  non-ergodic  situations;  in 
section  A.2,  we  describe  monoscaling  fractional  Brownian  motions.  On  the  other  hand, 
we  need  stationary  mono-  and  multiscaling  random  measures  to  calibrate  singularity 
analysis  procedures.  In  section  A.3  such  models  are  presented.  In  section  A.4,  we  return 
to  nonstationary  functions  to  add  two  multiscaling  models,  both  developed  first  for  cloud 
studies,  that  complete  the  collection.  In  section  A.5  finally,  we  summarize  and  display 
graphically  connections  between  the  different  types  of  model. 

Here  again,  we  are  dealing  with  a  number  of  computer  programs  that  synthesize  data 
(small  input,  large  output).  This  output  is  denoted  w{Xfn),f{Xm)  or  e(x^)  on  a  ID  grid  of 
constant  and  size  M  =  Lit 

Xm  =  m£,  ffi  =  0,...,M-1.  (AOa) 

Generalization  to  2D  grids, 

(Xmi,ym2)  =  (^hm2)£,  (mi,m2)  =  [0,M-l](8)[0,M-l],  (AOb) 

are  straightforward  in  most  cases,  at  least  if  statistically  isotropic  models  are  acceptable. 
In  one  (more  involved)  case,  the  2D  construction  is  described  in  full  detail  (§A.3.5). 
Where  convenient,  we  denote  spatial  dimensionality  by 

^i=l,2.  (Al) 

Unless  explicitly  stated  otherwise,  we  use  units  of  length  where 

^=1.  (A2) 

For  each  model,  we  describe  the  generation  algorithm  in  sufficient  detail  for  direct 
coding;  the  main  statistical  properties  are  expressed  as  a  function  of  their  parameters, 
with  either  short  derivations  or  references  to  the  literature,  and  their  stationarity, 
intermittency  and/or  ergodicity  properties  are  discussed. 

A.1  Nonstationarity  in  ID,  Running  Sums  of  Stationary  Processes 
A.l. I  Gaussian  White  Noise  and  Brownian  Motion 

Brownian  motion  (Bm),  a.k.a.  the  Wiener-Levy  process,  is  defined  as  the  integral  of 
white  noise  (w.n.),  i.e.,  the  running  sum  of  a  sequence  of  uncorrelated  “steps”  in  the 
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forward  or  backward  direction:  the  steps  are  denoted  w(xm),  m  =  These  numbers 

are  typically  drawn  from  a  Gaussian  distribution  but  Bernoulli^  and  Laplace^ 
distributions  are  also  used.  Throughout  this  Appendix,  we  denote  by  A(p,a)  Gaussian^ 
random  variables  of  mean  p  and  variance  (standard  deviation  a).  The  important 
requirement  is  finite  variance  to  invoke  the  central  limit  theorem  (CLT).  For  simplicity, 
we  also  require  the  vanishing  mean  to  eliminate  systematic  drifts;  so  w{xm)  =  A(0,a). 

Bm  is  therefore 

f{Xm)  =h+YllW{Xj),  no  sum  for  m  =  0.  (A3) 

If  w(Xfn)  is  Gaussian  — or,  more  generally  (CLT),  in  the  limit  m  >  1 — /(jc^)  is  Gaussian, 
and  so  are  its  increments  Afinxm)  =f(Xm+r>-Axm  )  =  where  m  =  0,...M-r  and 

r  =  0,...M,  because  sums  of  Gaussians  are  Gaussian  (with  cumulated  means  and 
variances).  It  is  easy  to  see  that  <IA/(r;xm)l^>  =  <IA/(r;xo)l^>  »=  {\JiXr)--fof')^^^  = 
where  =  <w(jC;„)^)  and^  q  > -l\  thus  ^(2)  =  1  and,  in  general,  t^(q)  =  ql2  from  the 
definition  of  structure  function  exponents  in  Eq.  (16).  We  now  contrast  the  properties  of 
w.n.  and  Bm: 

•  Gaussian  w.n.  is  the  prototypical  stationary  process:  all  its  statistics  are  independent 
of  the  position  x^^  Absolute  1-point  moments  are  invariant,  (lw(x;„)l^>=  (l>v(xi)l^)  for 
m  =  l,.,.,Mand  q>-\  as  is  the  2-point  autocorrelation  function 

{^{Xm-\-rXxm))  s  a25(r),  (A4a) 

form  =  l,...,M-r.  W.n.  is  not  only  stationary  but  ergodic:  spatial  averages  over  a 
single  but  large  enough  realization  converge  to  the  above  ensemble  averages,  largely 
because  of  the  Gaussian  nature  of  w(xm)-  In  practice,  very  good  estimates  can  be 
obtained  from  relatively  small  samples;  see  Fig.  Ala  for  a  case  with  M  =  1024. 

•  Bm  is  the  prototypical  nonstationary  process  with  stationary  increments.  Indeed,  its 

1- point  statistics  depend  explicitly  on  m:  we  may  have  (f(Xf„))  =  (fo)  (=  0  in  the 
following)  for  all  m  but^  =  (/q^)  +  m{wfy  for  m  >  0.  The  same  is  true  of  the 

2- point  autocorrelation: 

(Axm+rYixJ)  oc  +  \m+r\+m-r  (A4b) 

which  follows  from  the  identity  2Axm+r)Kxm)  =Axm+rA  +Axmf  -  \f(xm+r)-Axm)?-  In 


1a  symmetric  Bernoulli  trial  yields  w(jr^)  =  ±y  with  equally  probable  signs,  hence  Bm  on  a  grid  (widely 
used  in  diffusion  theory). 

^In  neutron  or  photon  transport,  Laplacian  or  two-sided  exponential  free  paths  (mean  X)  are  in  order: 
+A,ln(^  with  ^  uniformly  distributed  on  (0,1)  and  equally  probable  signs. 

^The  Box-Muller  transformation  can  be  used  to  obtain  zero-mean,  unit-variance  deviates;  one  way  to  do 
this  is  A^(0,1)  =  cos(7t^l)(-21n^2)^^^  where  (i  =  1,2)  are  computer-generated  (pseudo-random)  deviates 
distributed  uniformly  on  (0,1).  From  there  A^(p.,a)  =  p,+oAi;0,l). 

^Generally  the  r-dependent  pdf,  Prob{  A  <  Ajinx)  <  A+dA}/dA,  of  the  increment  AJ{rx)  =Jix+r)-Ax)  (r  >  0) 
is  non-vanishing  at  A  =  0;  moments  (IA/(rp:)l^)  =  jA^Prob(dA)  are  therefore  divergent  for  ^  <  -1 . 

5 Setting  (fo^>  =  0  enforces  the  nonstationarity  (the  m  =  0  point  is  special).  With  (fo^>  =  M(wi\  detrending 
and  cyclically  extending  it,  the  resulting  Bm  is  (cyclo-)stationary.  However,  we  are  interested  primarily  in 
scales  r  «:  M  for  which  this  Bm  is  still  nonstationary  in  the  sense  of  the  criterion  in  Eq.  (28a):  p  =2  >  1. 
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contrast,  2-point  statistics  based  on  Af{r\Xm)  depend  only  on  r.  Turning  to  spatial 
averages  on  a  scale  of  r  pixels  (1  r  <K  M),  1-point  means  and  variances  will 
fluctuate  wildly  (non-ergodicity)  but  averages  using  Af{r\Xm)  are  well-behaved 
(ergodic  increments)  as  long  as  Gaussian-type  distributions  are  used.  In  summary, 
Bm  is  neither  stationary  nor  ergodic  per  se  but  has  stationary  and  ergodic  increments. 
For  an  illustration.  Fig.  Alb  shows  the  running  sum  of  the  data  in  Fig.  Ala. 

A.  1.2  Levy-Stable  White  Noise  and  Levy  Flights 

White  noise  need  not  be  Gaussian,  hence  so  highly  ergodic.  Consider  a  power-law 
(e.g.,  Pareto)  distribution  where  moments  above  some  critical  order  are  divergent;  this 
means  that  their  spatial  estimates  increase  without  bounds  with  sample  size.  Figure  Ala 
illustrates  this  case  with  w.n.  obeying  a  Cauchy  law:  moments  of  order  q  >  1  diverge. 
Only  absolute  moments  of  very  small  non-integer  order  can  be  properly  estimated  with 
M~  10^.  The  counterpart  of  Bm  for  the  Cauchian  w.n.  is  graphed  in  Fig.  lb.  This 
running  sum  of  the  data  in  Fig.  la  is  a  “Levy-sable”  process,  often  referred  to  as  a  Levy 
“flight”  (Mandelbrot  1983).  Here  incremental  ergodicity  is  of  course  violated. 

By  definition.  Levy-sable  variables  are  a  four-parameter  class  of  deviates  L{a;a,b,c) 
obeying  the  rescaling  equation:  ^ 

XiL-nain'/<^(I^a),  (A5) 

•  a  is  the  “Levy  index.”  For  a  ->  0+,  the  solution  is  L  =  a  (the  degenerate  distribution) 
and  normally  distributed  random  variables  obey  the  rescaling  in  Eq.  (A5)  when  a  =  2. 
In  general,  0  <  a  <  2  is  the  critical  order  above  which  statistical  moments  diverge 
— variance  included.  For  further  details,  we  refer  to  Feller’s  (1971)  treatise. 

•  a  is  a  “centering”  parameter,  the  counterpart  of  mean  |i  in  Gaussian  deviates. 

•  b€  [-1,+1]  controls  skewness  and  has  no  equivalent  in  the  Gaussian  case. 

•  c  is  the  amplitude  parameter,  like  standard  deviation  a  in  the  Gaussian  case. 

We  focus  here  on  the  most  straightforward  generalizations  of  the  Gaussian  case  (Z?  =  0); 
denoting  these  L(a;a,c),  we  have  A^(|J.,a)  =  limct^2-L(a;|X,a).  The  only  case  with  a  closed- 
form  pdf  is  a  =  1  (Cauchy  deviates);  the  sample  in  Figs.  Ala,b  also  has  a  =  0. 

A.  1.3  Spectral  and  Multifractal  Properties 

We  have  already  shown  that  Bm  (Gaussian-type  increments)  is  monoscaling  with 

H{q)=lll,q>~l,  {A6a) 

The  spectral  exponent  in  E{k)  ^  /rP  is  related  to  the  ^  =  2  case  by  P  =  2//(2)+l  =  2. 

The  defining  relation  (A5)  for  non-skewed  Levy-stable  deviates  with  vanishing  mean 
{a  =  0)  says  that  \Af(r',Xm)\^  =  l2^(a;0,c)l^  =  ^^^“lI(a;0,c)l^.  So,  as  long  as  -1  <  ^  <  a. 
Levy  flight  increments  obey  {\Af(rt,Xm)\^  «  r^’^\  therefore  ^(g)  =  qia.  What  if  ^  >  a? 
In  theory,  the  corresponding  moments  do  not  exist,  so  neither  will  the  exponent  tfq).  In 


^The  symbol  “=”  means  “identity  in  distribution.” 
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Figure  A2:  Non-Ergodic  Stationary  and  Nonstationary  Processes,  (a)  White  Levy-stable  noise  with  the 
same  order  1/2  moment  as  the  Gaussian  case  in  Fig.  Ala;  this  data  is  stationary  but  non-ergodic  (at  least 
for  moments  of  order  q  >  1).  Symmetric  Levy-stable  deviates  with  zero  mean  L(a;0,c)  with  a  Levy  “index” 
a  =  1  and  a  scaling  parameter  c  =  0.338 =  r(3/4)^/(ttf2)  were  used,  obeying  a  Cauchy  law: 
dProb{X<L(l;0,c)  <  X+dX}  =  cdX/[n(c^+x2)].  {b)  Running  sum  of  the  data  in  panel  (a),  in  other  words, 
a  “Cauchian”  Levy-flight;  this  data  is  nonstationary  with  increments  that  are  stationary  but  not  ergodic. 
The  inset  is  a  vertical  zoom  into  the  second  half  of  the  sample,  showing  more  “jumps”  of  lesser  magnitude. 
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practice,  finite-sample  estimates  of  <IA/(r;A:m)l^>  become  dominated  by  the  most  intense 
event  in  the  w.n.  w{Xnd  =  X'(ot;0,c).  The  corresponding  value  of  Xm  is  straddled  by  exactly 
r  of  the  M-r  (>  r)  increments  included  in  the  spatial  averaging  operation;  this  leads  to 
(IA/(rp:;„)l^>  «  m2LXm{\w{Xni)\}^r/M  r,  hence  l^{q)  =  1  for  q  >  a.  In  summary, 
Levy-stable  processes  are  “operationally”  multiscaling  with  ^{q)  =  min{^/a,i } ;  hence 

H{q)  =  min{  l/a,l/^},  ^  >  ~1  (0  <  a  <  2).  (A6Z?) 

Here  again  we  find  (3  =  2  (operationally)  although  for  radically  different  reasons  than  in 
Bm.  To  obtain  Bm,  we  integrate  (p  — >  P+2)  a  sample  of  w.n.  (p  =  0)  with  spectral 
density  E{k)  ~  constant  <  <».  In  Levy  flights,  spectral  density  (like  variance)  diverges: 
E{k)  — >  oo  with  sample  size,  via  the  prefactor.  In  a  finite  sample, /(x)  has  a  finite  number 
of  discontinuities,  hence  p  =  2  as  for  Heaviside  steps.  ^ 

A.2  Fractional  Brownian  Motions  as  Nonstationary  Monofractal  Functions 

A.2.1  General  Properties  of  Fractional  Brownian  Motion 

Processes  known  as  “fractional  Brownian  motion”  (fBm)  are,  like  Levy  flights, 
generalizations  of  Bm  but  in  a  different  direction.  They  were  introduced  by  Mandelbrot 
and  van  Ness  (1968)  to  reproduce  the  correlation  properties  of  otherwise  Bm-like  “steps” 
observed  in  numerous  physical  signals,  for  instance,  turbulent  velocity.  Like  standard 
Bm  (uncorrelated  steps),  fBm  is  a  nonstationary  random  function  with  stationary 
increments. 

The  noteworthy  property  of  all  scale-invariant  nonstationary  signals  and  models,  fBm 
in  particular,  that  describes  incremental  correlations  is 

<A/(r;x+r)A/(r;x)>  =  [22^(2)-l  -  l]<A/(r;j:)^).  (A7) 

This  follows  directly  from  Eq.  (16),  expressed  for  r  and  2r  at  ^  =  2,  and  the  stationarity  of 
the  increments.  So,  at  H{2)  =  1/2  we  retrieve  Bm  with  characteristically  uncorrelated 
increments  in  (A7).  For  0  <  H{2)  <  1/2,  we  have  negatively  correlated  increments:  a 
jump  up  is  more  often  than  not  followed  by  a  jump  down  and  vice-versa.  This  leads  to  a 
less  nonstationary  process  than  Bm,  “anti-persistence”  in  Mandelbrot’s  (1983)  words. 
For  instance,  Kolmogorov  (1941)  scaling  in  turbulent  signals  corresponds  to  H(2)  »  1/3. 
For  1/2  <  H{2)  <  1,  we  have  the  opposite  situation:  positively  correlated  increments  or 
“persistence”  in  Eq.  (A7).  If  a  jump  tends  to  be  followed  by  another  in  the  same 
direction,  then  we  have  more  nonstationarity  than  in  Bm.  Mandelbrot  notes  that  the 
Earth’s  topography  is  reasonably  well  modeled  by  setting  H{2)  ~  0.75. 

The  higher-order  structure  functions  of  fBm  obey 

H{q)  =  i:iq)lq^H2,0<H2<\.  (A8) 

We  recall  from  §4.2  in  the  main  body  that  relation  (A8)  arises  whenever  the  increments 
are  distributed  narrowly  enough,  Gaussian-style,  that  we  can  use  the  simplest  dimensional 


^The  only  difference  in  Fourier  space  between  w.n.  and  a  Dirac  5  is  the  random  phases;  both  have  p  =  0, 
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arguments  to  estimate  higher-order  moments  from  low-order  ones:  taking  q  =  2  as.  the 
standard,  then  <l/(x+r)-/(Ac)i^>  «  q  >  -1.  The  definition  of  structure 

function  exponents  in  Eq.  (16)  then  leads  to  t^(q)  -  ^(2)q/2  =  qH{2)  =  qH2,  hence  (A8). 

Equation  (A8)  shows  that  fBm  belongs  to  the  restricted  class  of  monoivaciaX  random 
functions;  its  only  parameter  relevant  to  scaling,  H2,  is  equal,  in  particular,  to  the  mean 
Holder  exponent  H\  =  H{\).  Figure  A3a  shows  versus  q  for  H\=H2  =  1/3,  1/2,  2/3, 
hence  an  increasing  degree  of  nonstationarity.  The  two  extreme  cases  are  also  shown: 

•  =  //2  — >  0  is  the  “stationary”  limit  where  increments  are  not  only  scale-invariant 
but  scale-independent;  and 

•  =  IT2  — >  1  is  the  “differentiable”  limit  where  increments  become  (almost) 
everywhere  directly  proportional  to  the  scale. 

In  the  remainder  of  this  section,  we  describe  the  two  major  computational  routes  to 
fBm  — one  in  physical  space,  the  other  in  Fourier  space —  and  a  variant  using  running 
sums.  The  common  denominator  of  these  methods  is  to  use  only  Gaussian  random 
numbers  and  additions,  hence  the  term  “additive”  is  frequently  used  for  this  whole  class 
of  models.  Sums  of  normal  deviates  are  normally  distributed,  so  all  quantities  involved 
are  Gaussian  — most  importantly,  X^)  and  f(x+r)~j{x) —  therefore  Eq.  (A8)  is  verified  by 
construction.  Non-Gaussian  multifractal  models  that  violate  (A8)  were  discussed  above 
and  will  be  again  in  §A.4,  after  surveying  the  prerequisite  singular  measures  in  s  §A.3. 

A.2.2  Synthesis  in  Physical  Space  (Recursive  Mid-Point  Displacement) 

The  simplest  algorithm  for  making  fBm  is  known  as  “mid-point  displacement,”  cf. 
Peitgen  et  al  (1988).  For  procedures  in  physical  space,  it  is  generally  convenient  to  use 
units  of  length  where  the  grid  constant  is  unity  and  we  take  a  power  of  2  plus  one  grid 
size* 

M  =  L„+l=2«  +  l,n>0.  (A9) 

To  determine /„(J9)  for 7  =  0,...  ,2”,  we  first  set^ 

/„(0)=/o  (AlO) 

JWo.cJo)  where  Cq  =  2(^-f^2)-l,  or 
M^n)  -  I  j^(o)  for  cyclical  boundary  conditions  ' 

This  completes  the  maximum  scale  ro  =  Ln-  We  then  proceed  recursively  to  smaller  and 
smaller  scales, 

n  =  ri.d2  =  ro/2'  =  2«-'  (i  =  l,...,n),  (A12) 

using  a  decreasing  sequence  of  standard  deviations, 

Ci  =  Oi.inf^2  =  Co/2i^2  (1=  !,...,«).  (A13) 

New  points  are  generated  by  averaging  existing  ones  and  adding  random  displacements: 

fnixj+rd  =  \fniXj-\-2rd+fnixj)V2  +  NiO,Gi),  Xj  =  2;>i  (j  =  0 . 2^-1),  (A14) 


^The  value  of/„(0)  in  (AlO)  is  arbitrary  but  deterministic,  hence  the  patently  nonstationary  nature  of  fBm. 
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Results  for  five  cases  are  presented  in  Fig.  A3b,  illustrating  more  and  more 
nonstationarity:  H2  =  0  (“1//”  noise,  cf.  next  section);  H2  =  1/3  (anti-persistence,  as  in 
turbulent  signals);  H2  =  1/2  (Bm);  H2  =  2/3  (persistence,  as  in  topography);  and  H2  -  1 
(a  noiseless  linear  trend).  For  the  generalization  to  2D,  we  refer  to  Peitgen  et  al.  (1988). 


T  order  of  absolute  moment,  q 

(a)  (b) 

Figure  A3:  Fractional  Brownian  Motions  in  ID,  with  More  or  Less  Nonstationarity.  (a)  0h-order 
structure  function  exponents  ^(5)  =  qH^.  (b)  Three  different  fBm’s  (H|  =H2  =  1/3,  1/2,  2/3,  by  increasing 
degree  of  nonstationarity)  and  two  limiting  cases:  =0  (“l/yhoise,”  border-line  stationarity)  and  Hi  =  1 

(differentiability,  maximum  nonstationarity  on  this  scale). 

A.2.3  Synthesis  in  Fourier  Space  (Power-Law  Filtering) 

Another  way  of  generating  fBm  is  with  low-pass  power-law  filtering  in  Fourier  space, 
a  procedure  also  known  as  “fractional  integration”  that  we  will  invoke  again  further  on  to 
generate  multifractal  signals.^ 

Following  Voss  (1983),  we  start  with  Gaussian  white  noise  on  a  grid  of  size  2": 

wn(xj)  =  V(0,l),7  =  0,...,2«-l.  (A15) 

This  trivial  stochastic  process  made  of  normal  deviates,  completely  uncorrelated  from  one 
grid  point  to  the  next,  is  scale-invariant  by  construction  and  characterized  by  a  spectral 
exponent 

pv.  =  0.  (A16) 

We  now  want  to  incorporate  correlations,  leading  to 

P  =  2//2+L  (A17) 

This  is  easily  done  by  power-law  filtering  in  Fourier  space,  where  it  is  convenient  to  set  , 

L  =  2Hn=\.  (A18) 


^Pearson  (1990)  surveys  fractional  integration,  as  defined  in  the  mathematical  literature,  which  is  somewhat 
different  from  ours. 
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After  obtaining  1 

j7„(0)=iv„(0)^0  , 

we  compute /„(jCj)  by  inverse  Fourier  transform  for  jcy  =  jin  (j  =  0,..,,2"-l).  The  exponent 
H 2+1/2  (0  <  H2<  1)  in  Eq.  (A19)  is  naturally  the  “order  of  the  fractional  integration.” 
Generalization  to  2D  is  straightforward. 

Equation  (A  17)  still  applies  if  we  take  H2<0  hence  p  <  1,  or  7/2  >  1  hence  P  >  3,  in 
(A  19).  The  latter  choice  yields  nonstationary  random  functions  with  wonstationary 
increments,  varying  so  weakly  in  the  small  scales  that  they  are  everywhere  continuous 
and  almost  everywhere  differentiable.  So,  increments  become  proportional  to  distance 
between  points  of  interest:  {[fix+ryfix)])  »  {[fix+ryjix)]^^^!  oc  r,  leading  to  H{q)  =  1. 
We  now  discuss  the  consequences  of  the  former  choice. 

A.2.4  Synthesis  from  Stationary  Gaussian  Scaling  Noises 

What  if  we  take  7/2  <  0  in  the  recipe  contained  in  Eq.  (A16)?  The  recursive  mid-point 
displacement  procedure  in  §A.2.3  can  also  be  applied  for  negative^  H2.  However,  the 
outcome  for  H2<0  and  H2>0  have  radically  different  properties. 

•  If  7/2  <  0,  the  corresponding  scale-invariant  noise  has  an  energy  spectrum  E(k)  ArP 
with  P  <  1;  there  is  finite  energy  (i.e.,  variance)  at  large  scales  (k —>  0)  but 
divergence  at  small  ones  (k  ->  «>)  — a  so-called  “ultra-violet”  catastrophe  occurs. 
This  means  that  small-scale  singularities  will  develop  whereas  large-scale  properties 
will  remain  relatively  well-behaved. 

•  For  7/2  >  0  (P  >  1),  the  opposite  occurs  — an  “infra-red”  catastrophe.  We  can 
therefore  expect  large  fluctuations  at  scales  comparable  to  the  computational  domain, 
but  relatively  small  ones  at  the  pixel  scale. 

Only  for  772  <  0  is  there  a  well-defined,  monotonically  decreasing  power-law 
autocorrelation  function:^  {f{x+r]f(x))  «=  r"(l“P)  =  hence  stationarity  in  the  broad- 

sense.  For  0  <  772  <  1,  a  power-law  structure  function  for  ^  =  2  follows  from  a  power-law 
energy  spectrum  E(k):  {\f(x+r)-J{x)]'^)  <>=  hence  (broad-sense)  stationary  increments. 

We  will  refer  to  these  models  with  772  <  0  as  “Stationary  Gaussian  Scaling  Noises” 
(or  SGSNs).  One-dimensional  fBm  with  spectral  exponent  1  <  P  <  3  can  be  obtained  by 
standard  integration  (i.e.,  running  sums  in  physical  space)  of  SGSNs  with  P’  =  p-2  such 
that  Ip’l  <  1.  These  SGSNs  are  first  obtained  by  fractional  integration  of  Gaussian  w.n.  at 
order  p72  =  772+1/2,  ranging  from  -1/2  to  1/2  (-1  <  772  <  0).  We  note  that,  for  the  anti- 
persistent  case  (1  <  p  <  2),  the  corresponding  SGSNs  results  in  effect  from  a  fractional 


^Note  that  we  leave  the  phases  tan~'(Inip>„(/:)]/Re[tt'„(A)])  of  the  Fourier  components  unchanged. 
However,  to  retrieve  Bm  exactly  (as  defined  in  §A.1.1),  we  need  to  change  the  phases  by  exactly  %/2: 
fnik)  =  Wnik)l(ik)  will  yield  a  realization  with/o  =  vv„(0)L  ~  0. 

^The  choice  H2  =  -1/2  leads  to  a  rather  convoluted  way  of  generating  uncorrelated  values  at  every  pixel. 
In  contrast,  this  choice  implies  a  null  operation  in  the  Fourier-based  recipe  in  Eq.  (A19). 

^This  requires  0  <  j3  <  1  (-1/2  <  7/2  <  0),  otherwise  (H2  <  -1/2)  there  are  anti-correlations  from  one 
grid-point  to  the  next;  thus  (/(x-Hr)/i;x))  <  0,  and  a  power-law  parameterization  is  invalid. 
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“differentiation”  of  Gaussian  white  noise,  the  order  of  “integration”  P72  being  negative, 
between  0  (for  Bm)  and  -1/2  (for  “1^’  noise). 

A.  3  Cascades  Leading  to  Singular,  Stationary,  Mono-  and  Multijractal  Measures 

Calibration  of  singularity  analysis  algorithms  calls  for  a  different  class  of  models  than 
described  above.  The  main  requirement  is  that  the  model  can  be  read  as  a  “measure,”  a 
non-negative^  field  that  we  will  denote  generically  by  e(A:).  This  excludes  the  models 
discussed  up  to  now.  In  practical  data  analysis  applications,  the  nonstationary  data- 
stream  can  be  used  to  generate  a  measure,  typically  by  taking  small-  but  finite-scale 
gradients  then  their  absolute  values  or  squares.  We  can  do  this  for  the  models  presented 
in  section  A.2  but  the  results  are  quite  disappointing,  the  small-scale  gradients  fields  of 
Gaussian  models  being  at  best  weakly  variable  (narrowly  distributed). 

Let/(:c)  be  a  nonstationary  additive  process  based  on  >v(x),  a  Gaussian-type  noise  such 
as  an  SGSN.  This  gives  e(;c)  =  lw(x)l  which  varies  less  than  w(;r)  itself.  Being  essentially 
a  running  mean,  the  coarse-grained  measure  £(r;A:)  =  (l/r)X?’=x  will  be  statistically 
independent  not  only  of  jc,  stationarity  oblige,  but  also  of  r  over  a  large  range  of  values 
because  of  the  strong  ergodic  property  of  Gaussian-like  w(x).  This  translates  to  K{q)  =  0 
in  Eq.  (20),  apart  from  finite  sampling  effects  (Aurell  et  al.  1992).  The  same  is  true  for 
models  of  w(.v)  with  power-law  tails,  except  that  moments  exist  only  up  to  some  critical 
order  K(q)  =  0forq<  q^',  for  instance.  Levy-stable  variables  yield  qD  =  a<2. 

To  obtain  models  with  K{q)^  0,  we  must  leave  the  realm  of  additive  models  and 
generate  measures  with  multiplicative  cascades.  In  the  same  way  as  fBm’s  are 
monofractal  random  functions  jix)  with  multifractal  counterparts  (some  of  which  are 
described  in  Section  A.4),  there  are  mono-  and  multifractal  measures.  The  literature  on 
multifractal  measures  is  vast  and  increasing  at  a  rapid  pace.  In  this  survey,  we  will  start 
with  some  well-known  monofractal  cases  in  ID  and  2D  (§A.3.1);  straightforward 
generalizations  of  these  (§A.3.2)  lead  to  multiscaling  in  ID  (§A.3.3-4)  and  2D  (§A.3.5). 

Mono-  and  multifractal  cascade  models  come  in  two  distinct  flavors:  the  “canonical” 
(§A.3.3)  and  the  “microcanonical”  (§A.3.4-5).  The  latter,  having  the  same  singularity 
properties  for  every  realization  (ergodicity  in  the  sense  of  singular  measures),  are 
particularly  useful  for  the  purposes  of  calibration;  the  former  are  arguably  more  realistic 
models  for  geophysical  fields  (Schertzer  and  Lovejoy  1987,  Gupta  and  Waymire  1993). 
However,  they  are  always  non-ergodic  at  some  level  of  confidence  and  can  sometimes 
exhibit  the  interesting  statistical  phenomenon  of  divergence  of  higher-order  moments.  If 
ignored,  this  last  characteristic  can  affect  a  calibration  procedure. 

A3J  Cantor’s,  Dirac’s  and  Other  Monofractal  Measures 

Consider  the  example  of  Cantor’s  measure,  supported  entirely  by  Cantor’s  famous  set 
(Fig.  A4a)  which  has  fractal  dimension  Df  =  log2/log3  =  log32  =  0.63 1  --  <  d  =  \. 


^Mathematically,  “measures”  can  be  thought  of  as  “generalized”  functions  in  the  sense  of  Dirac  (i.e., 
defined  only  under  integrals)  that  are  furthermore  non-negative.  The  sum  of  integrals  over  disjoint  sets  is 
therefore  equal  to  the  integral  over  their  union. 
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Cantor’s  measure  is  generated  by  starting  with  a  uniform  distribution,  eo(jc)  =  1  on  [0,1] 
which  is  then  divided  into  3  equal  parts.  The  middle^  third  is  emptied  of  its  measure 
— this  cell  (or  “eddy”)  is  now  “dead” —  and  its  mass  is  uniformly  redistributed  between 
its  neighbors  where  eo(Af)  therefore  becomes  £1(0:)  =  eo(^)x(  1+1/2)  =  3/2.  We  have  thus 
required  the  spatial  average  to  remain  constant.  After  n  steps,  the  scale  is  r„  =  3“”,  there 
are  3”  cells  in  all,  only  2"  of  which  carry  a  measure  £n(jc)  =  (3/2)'*,  elsewhere  £„(x)  =  0. 
Although  deterministic,  one  can  still  define  the  spatial  statistics  of  this  measure: 
(£(r„-,x)^)  =  (2/3)”x(3/2)”^  =  [(l/3)"](l"?)logi/3(2/3).  Equation  (20)  in  the  main  text  then 
yields 

K{q)  =  [l-log32](^-l),  q  >  0.  (A20a) 

For  ^  =  0  we  have  K{q)  =  0  (by  definition)  and,  for  ^  <  0,  we  find  <£(r„;x)«>  =  00 
(formally,  we  can  set  K{q)  =  00  here)  because  the  empty  events,  £(r„p;)  =  0,  dominate. 
We  note  that  the  factor  l-log32  in  (20a)  is  the  codimension  d~Di  of  Cantor’s  set. 

An  example  embedded  in  2D  is  easily  generated  as  follows.  Start  with  a  uniform 
distribution,  £o(x,y)  =  1,  on  the  unit  square;  then  divided  it  into  4  equal  parts;  one  sub¬ 
square  is  picked  at  random^,  and  its  measure  (1/4  of  total)  is  set  to  0;  the  measure  in  the 
neighboring  cells  is  boosted  proportionately,  becoming  £i(x,>')  =  £o(x,y)x(  1+1/3)  =  4/3. 
After  n  steps,  the  scale  of  interest  is  =  2“”;  there  are  4”  cells  in  all  with  only  3"  of  them 
containing  a  non-vanishing  measure,  £„(x,y)  =  (4/3)”.  Spatial  averaging  leads  therefore  to 
<£(r„;x)9>  =  (3/4)«x(4/3)”«  =  [(l/2)”](l-^)logi/2(3/4),  hence 

K{q)  =  [2-log23](^-l),  ^  >  0.  (A20/>) 

The  support  of  this  measure  is  a  fractal  set  of  codimension  -Ar(0+)  =  2-log23,  hence  a 
fractal  dimension  Df  =  2+Ar(0+)  =  log23  =  1.585  •  <  =  2.  as  can  by  verified  by  direct 

box-counting  methods.^ 

The  simplest  random  monofractal  measure  is  probably  a  Dirac  5-function  positioned 
at  a  random  point  x*  on  the  unit  segment  ind=\\  £(x)  =  5(x-x*),  0  <  x*  <  1 ;  we  can  also 
consider  the  unit  square  in  the  J  =  2  case:  £(x,y)  =  5(x-x*)5(y-y*),  (x*,y*)  g  [0,1]2. 
These  measures  are  entirely  concentrated  onto  a  single  point;  in  other  words  the  fractal 
dimension  of  their  support  is  Df  =  0,  hence  a  codimension  d-Df  -  d.  Let  us  estimate  the 
statistical  moments  of  the  coarse-grained  measure  £(r;x)  contained  in  the  sub-interval 
[x,x+r)  or  sub-domain  [x,x+r)(8)[y,y+r),  0  <  r  <  1,  according  to  Eq.  (14):  taking  x  (and,  if 
necessary,  y)  at  random,  we  find  £(r;x)  =  l/r^  with  probability  and  0  otherwise. 
Therefore,  we  have  <£(r;x)^>  =  hence,  from  Eq.  (20): 

^  >  0.  (A20c) 

All  of  the  above  formulas  (A20a,b,c)  can  be  parameterized  as 

Kiq)  =  (q-l)Co,q>0,  (A21) 


^The  “middle-third”  convention  leads  to  a  deterministic  measure.  The  spatial  statistics  are  unchanged  by 
picking  the  next  dead  cell  at  random. 

^If  the  same  (e.g.,  upper-left)  sub-square  is  picked  each  time,  the  limiting  set  is  akin  to  Sierpinski’s  triangle. 
^Total  number  of  boxes  of  size  r  =  1/2”  (n  >  0)  in  unit  square:  Ni  =  (2”)^  =  (1/r)^  (d  =  2).  Number  needed 
to  cover  fractal  set:  A^s  =  3”  oc  (l/r)^f.  Fractal  dimension  of  set:  Df=  logN^flogiVr)  =  log23  (QED), 
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where  Co  =  d-Df  >  0  is  the  codimension  of  the  support.  The  above  measures  are 
“monofractal”  in  the  sense  that  their  supports  are  nontrivial  fractals;  however,  on  these 
sparse  subsets  of  space,  they  are  uniformly  distributed.  Why  their  exponents  K(q)  are 
always  in  the  linear  relationship  spelled  out  in  Eq.  (21)  can  be  traced  to  this  uniformity. 

Another  common  feature  of  the  above  monofractal  models  is  that  they  were 
constrained,  largely  for  tutorial  purposes,  to  have  the  same  statistics  for  every  realization 
(and  indeed  at  every  cascade  step);  in  particular,  the  total  measure  is  conserved.  Such 
models  are  called  “microcanonical”  (Mandelbrot,  1974).  “Canonical”  counterparts  of 
these  models  — ^where  only  the  probability  of  killing  a  cell  is  prescribed,  irrespective  of 
what  happens  to  its  neighbors —  have  been  proposed  to  model  ID  transects  of  the  kinetic 
energy  dissipation  field  in  fully-developed  3D  turbulence  (Novikov  and  Stewart,  1964; 
Mandelbrot,  1974;  Frisch  et  al,  1978).  Indeed  there  is  no  reason  to  require  a  ID  sample 
of  a  3D  field  to  obey  a  conservation  law  that  applies  at  best  to  the  whole  volume  of  the 
system.  The  observed  statistics  of  turbulent  flows  are  best  reproduced  by  models  with  C] 
values  in  the  range  0.2-0.3.  Following  the  nomenclature  of  Frisch  et  al.  (1978),  we  refer 
to  these  monoscaling  canonical  cascade  models  collectively  as  “beta”  models. 


(a) 


Figure  A4:  Two  Multiplicative  Cascades  in  ID  with  branching  ratio  A,  =  3.  (a)  First  3  cascade  steps  in  the 
construction  of  Cantor’s  deterministic  measure,  with  D(q)  =  \og^2  =  0.63  -.  (b)  A  random  log-normal 
measure,  cascade  steps#  1,  2,  4  (inset) and  6,  with  D{q)~d-Ciq  where d=  1,  C,  =0.1 1  -  and  q<d/C^. 
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A.  3. 2  Canonical  Random  Measures  and  Divergence  of  Higher-Order  Moments 


There  is  no  reason  to  limit  ourselves  to  uniform  measures.  A  simple  way  of  obtaining 
non-uniform  measures  is  to  allow  the  random  multiplicative  “weights”  used  implicitly  in 
the  above  cascade  algorithms  to  differ  from  0  or  a  constant.  In  general,  we  can  envision  a 
“turbulent”  cascade  process  in  =  1  or  2  dimensions  that  proceeds  by  divisions  into 
“sub-eddies”  of  equal  size  at  each  step;  the  integer  X  is  known  as  the  “branching  ratio.” 
After  n  steps  the  scale  is  r„  =  we  have 

n 

E/I  =  YlWi,  (A22a) 

1 

requiring  only  that 

<W>=1.  {A22b) 

Redefining  temporarily  as  -ln<£„  9)/lnr„  —as  opposed  to  the  parameterization  in 
the  main  text’s  Eq.  (20)  of  the  integrals  (measures)  defined  in  Eqs.  (14—15) —  it  is  easy  to 
see  that 

K(q)  =  =  logx< ^  G  91 ;  (W^)  <  oo } .  (A23a) 

Defined  in  this  way,  K{q)  inherits  all  of  the  analytical  properties  of  cumulant-generating 
function!  for  the  random  variables  \nW,  In  particular,  K{q)  is  convex  with  A'(0)  =  0.  We 
also  know  that  ^(I)  =  0  due  to  (A22^). 

Returning  to  the  original  definition  of  K(q)  in  relation  to  the  measures  in  Eq.  (20)  is 
quite  involved  mathematically  (Mandelbrot,  1974;  Kahane  and  Peryiere,  1976;  Schertzer 
and  Lovejoy,  1987;  Gupta  and  Waymire,  1993).  However,  the  only  difference  is 
ultimately  that  the  trivial  condition  on  q  in  Eq.  (A23a)  is  replaced  by 


(A23h) 


For  q  >  qo,  the  moments  <e(r;jr)9>  are  divergent.  In  practical  data  analysis  applications, 
the  symptom  of  a  diverging  moment  is  that  its  estimate  is  unstable,  being  dominated  by 
the  single  largest  event.  The  intensity  of  this  event  will  depend  critically  on  the  sample 
size  (in  this  case,  grid  size  and  number  of  realizations);  we  refer  to  Schertzer  and 
Lovejoy  (1992)  for  further  details. 

In  summary,  we  can  view  the  monofractal  models  of  §A.3.1  as  a  limiting  case  in  a 
continuum  of  models.  The  codimension  of  the  measure’s  support  is^ 


Co  =  -Km.  (A24) 

If,  as  in  monofractal  cases,  Co  >  0  then  it  is  arguably  the  most  important  parameter  of  the 
model  since  it  defines  geometrically  the  concentration  of  the  measure  on  a  sparse  subset 


^The  characteristic  function  of  a  random  variable  ^  is  defined  as  ^(t)  =  (explit^),  i.e.,  the  Fourier  transform 
of  its  pdf;  the  cumulant-generating  function  (or  “2nd  characteristic  function”)  is  ln(|)(0.  Taking  ^  =  InlV 
and  it  =  q,  we  have  K{q)  =  ln(|)(5/i)/lnA,.  For  imaginary  arguments,  <|)(t)  is  real  and  convex  (Feller  1971). 
^For  ^  =  0+  everywhere  8(r,x)  >  0  we  have  e(r,x)9  =  1,  otherwise  (i.e.,  £(r,x)  =  0),  we  have  e(r,x)9  =  0. 
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of  space.  In  many  cases  however  (cf.  next  sub-section),  the  probability  of  drawing 
exactly  null  weights  is  vanishingly  small;  this  leads  to  jK’(0+)  =  K(0)  =  0,  hence  Co  =  0. 

In  situations  where  Co  =  0  (i.e.,  the  measure  is  supported  by  all  of  space),  the  next 
simplest  way  of  quantifying  its  degree  of  concentration  (hence  intermittency)  is  the 
information  codimension: 

Ci=^’(l).  (A25) 

This  quantity  can  be  obtained  directly  from  the  weights  since  Eqs.  (A23a)  and  (A25) 
yield  isr’(i)  =  (VWogxW)).  For  monofractal  models,  the  linear  formula  for  X(g)  in  Eq. 
(A21)  yields  =  ^(OV,  hence  Co  =  Ci.  We  also  note  that,  Wlog^W  being  a  convex 
function  of  W,  Jensen’s  (1906)  inequality  tells  us  that  Ci  =  (WlogxW)  >  (W)logx<W>  =  0. 
Yet  another  way  of  parameterizing  intermittency  uses  the  correlation  codimension 

C2  =  A:(2).  (A26) 

This  2nd-order  statistic  is  the  preferred  choice  in  the  turbulence  literature,  where  C2  is  in 
fact  referred  to  as  the  “intermittency  parameter”  (and  denoted  “|li”).  It  determines  the 
measure’s  spectral  exponent^ 

Pe=  1-^(2)  <1.  (A27) 

The  inequality  follows  from  KiqYs  convexity  and  iir(l)  =  0  which  imply  Ki2)  >  0.  This 

establishes  the  stationarity  of  cascade  processes  according  to  the  criterion  in  Eq.  (2Sa). 

A.  3, 3  The  Canonical  Log-Normal  Model,  Stationarity  in  Presence  of  Intermittency 

Consider  a  specific  example:  we  use  the  Cantor  measure  construction  (k  =  3)  with 
log-normal  rather  than  Bernoulli  weights.  Instead  of  Wi  =  0  (Prob  1/3)  or  3/2  (Prob  2/3), 
we  take: 

Wi  =  exp[A(|X,a)]  =  exp[tt+aAr(0, 1)]  (/  =  !,...  ,n),  (A28a) 

with  =  (A2Sb) 

to  properly  normalize  the  cascade,  (W)  =  1. 

In  Fig.  A4b,  we  illustrate  the  first  steps  of  the  construction  using  a  =  0.4.  Notice  the 
development  of  multiple  singularities.  At  the  same  time,  most  of  £/j(x)’s  values  become 
smaller  as  n  is  incremented.  Indeed,  at  constant  x,  ln£„(x)  =  EJlnWi  =  Zoiit+aMO,!)]  is 
executing  a  random  walk  as  a  function  of  (discrete)  “time”  n,  like  the  Bm  described  in 
§A.1.1  but  with  a  systematic  drift  in  the  negative  direction,  due  to  (A2^b). 

Log-normal  cascades  were  first  introduced  in  turbulence  theory  by  Kolmogorov 
(1962)  and  Obukhov  (1962)  to  account  for  the  effects  of  intermittency  in  the  kinetic 
energy  dissipation  rate  at  high  Reynolds  numbers;  G  around  0.7  was  found  to  fit  the  data. 
We  recall  that,  like  SGSNs  (§A.2.4),  the  model  in  Fig.  A4b  exemplifies  stationarity 
according  to  criterion  (28fl)  in  the  main  text,  cf.  Eq.  (A27).  However,  in  sharp  contrast 
with  SGSNs,  these  cascades  are  highly  non-Gaussian,  not  unlike  the  Levy-stable  w.n. 


^This  follows  from  {e(x)e{x+r))  ~  (Monin  and  Yaglom  1975,  pp.  618-620)  which  is  in  Fourier 

duality  with  the  energy  spectrum  (Wiener-Khinchin  theorem)  Ef^k)  ~  k~^t  with  Pg  +  K(2)  =  1 . 
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described  previously.  This  is  traceable  to  the  multiplicative  nature  of  the  construction.  In 
contrast  with  the  white  Levy-stable  noise,  cascades  have  more  interesting  correlation 
properties  due  to  the  recursive  nature  of  their  construction. 

Equations  (A23a)  and  (A2Sa,b)  yield  ^ 

=  (A29) 

Using  (A25),  this  can  be  rewritten^ 

K(q)  =  Ciq(q-l),q<qD  =  d/Cu  (A30) 

by  identifying  Ci  with  CT2/(2in^).  The  condition  on  q  in  Eq.  (A30)  accounts  for  the 
divergence  of  moments,  as  predicted  by  Eq.  (A23fe). 

In  Fig.  A5a,  we  have  plotted  K{q)  for  a  log-normal  model  in  J  =  1  with  C\  =  0.25; 
notice  the  divergence  of  moments,  formally  K{q)  =  for  ^  =  4.  For  comparison, 

we  have  also  plotted  K{q)  for  two  of  the  monofractal  models  discussed  previously,  Dirac 
5’s  and  Cantor  measures.  Figure  A5b  shows  a  sequence  of  log-normal  cascade  models 
with  increasing  degrees  of  intermittency:  a  =  0  (Ci  =  0);  a  =  0.25  (Ci  =  0.05);  a  =  0.5 
(Cl  =  0.18);  and  a  =  1.0  (Ci  «  0.72).  Notice  the  increasing  concentration  and  spikiness. 

Figure  AS:  Log-Normal  Cascade  Models  in  ID,  with  More-or-Less  Intermittency.  (a)  Exponents  K(q)  for 
a  log-normal  multifractal  case  and  two  monofractal  cases;  the  onset  of  divergence  for  9  >  4  is  indicated  for 
the  ID  log-normal  model  with  C,  =  0.25.  (b)  Four  cascades  with  (e>  =  1  that  illustrate  an  increasing  degree 
of  intermittency:  C,  =0.0,  0.05,  0.18,  and  0.72.  Models  with  Cj  >  1  would  be  “degenerate”  in  ID, 
meaning  almost  empty  in  most  realizations  but  a  huge  peak  would  occur  now  and  then. 


0=1.0 
C -0.72 


„ .  .  J... 


J . 


g-0.5 
C -0.18 


■t  ...jj... 


(a) 


(b) 


^The  characteristic  function  for  normal  deviates  is  (exp[iW(p,a)]>  =  exp[i/p,-t2(a2/2)],  where  we  set  f  =  ^/t. 
^Schertzer  and  Lovejoy  (1987)  propose  a  class  of  log-Levy  cascade  models  parameterized  by  a  e  [0,2]. 
They  run  the  gamut  between  beta-models  in  Eq.  (A21)  and  log-normal  models  in  Eq.  (A29):  if  a  1, 
^(9)  =  Ci(^-q)/{a-l),  and  K(q)  =  Ci^ln^  for  a  =  1. 
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A.3.4  The  Microcanonical  Log-Binomial  "p-modd” 

Meneveau  and  Sreenivasan’s  (1987^?)  one-dimensional  “p-model”  is  a  microcanonical 
alternative  to  the  canonical  log-normal  model.  Here  X  =  2  and  the  weights  are 


Wi  =  l±il-2p)  = 


W-  =  2p  (Prob=l/2) 
W+  =  2-2;j(Prob=  1/2) 


,0<p<l/2  (i  =  0,. 


,«-l) 


(A3  la) 


in  one  sub-eddy,  and 


iA3lb) 


in  the  other,  to  conserve  the  total  measure  (cf.  Fig.  A6).  The  microcanonical  nature  of 
this  model  means  that  every  sequence  of  ±’s  yields  the  same  binomially  distributed 
values  for  £„(x),  (m  =  0,.../i)  with  probability  (J)/2";  only  their  order  of 

occurrence  changes.  Such  predictability  — exact  ergodicity  in  the  sense  of  singular 
measures —  in  a  calibration  procedure  for  singularity  analysis  is  obviously  desirable. 

The  choice  of  weights  in  Eq.  (A3 la)  yields 

K(q)  =  log2[W-^/2  W+V2]  =  log2[(2p)^  +  (2-2p)^l  -  1,  (A32) 

for^  ^  G  SR.  Exponents  of  special  interest  are 

•  the  information  dimension:  D\  =  1-Ci  =  -|>log2P+(l-p)log2(l-/?)]  from  ^  =  1;  and 

•  the  spectral  exponent:  Pe  =  l-log2[l+(l-2p)2]  from  q-2. 

The  latter  is  plotted  as  a  function  of  p  in  Fig.  Ala  where  the  inset  illustrates  a  typical 
realization.  The  sample  in  question  was  generated  using  p  =  0.35,  as  suggested  by 
Meneveau  and  Sreenivasan’s  (1987a)  measurements  of  the  dissipation  field  in  turbulent 
flows;  its  more  important  characteristics  are  Di  ~  0.65  (Ci  ~  0.35),  and  p  »  0.88.  The 
two  limiting  cases  are  familiar:  p  ->  0+  yields  randomly  placed  5-functions,  /?  ->  1/2"  (or 
0^0+  for  the  log-normal  case)  weak  7/f-type  fluctuations  from  the  unitary  mean  value. 


A.  3. 5  The  “p(3)-model”,  A  2D  Generalization  of  the  “p-model” 

The  most  general  microcanonical  cascade  with  %  =  2  'm  d  =  2  calls  for  three 
parameters  that  we  will  denote 

Q<p\,P2,p3<y2.  (A33) 

They  will  be  used  to  shift  mass  in  the  same  way  as  in  the  p-model  but  in  both  horizontal 
(E-W)  and  vertical  (N-S)  directions  as  well  as  between  the  two  diagonals: 

[Eg^W:  l±(l-2px)l 

^  N  S:  1  +  (l-2py)  K  (A34) 

[  NE/SW  NW/SE:  1  ±  {\-2pf)  \ 


^We  have  K{q)  =  ^log2W±-l  for  q  ±c».  From  Eq.  (A23/?),  and  knowing  that  =  2~2p  <  2,  the  large  q 
limit  tells  us  that  all  moments  converge  for  microcanonical  models.  Not  only  they  converge  but  (by 
construction)  their  estimates  for  every  realization  are  identical,  even  for  a  single  cascade  step. 
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Start  with  uniform  slab, 


transfer  fraction  Fi  of  the  mass, 


J-..} 

rSluJoHil  direction 


then  transfer  fraction  Fz  within  each  half, 


and  so  on  ... 


Figure  A6:  Genesis  of  a  Microcanonical  Cascade  in  ID.  Three  steps  are  illustrated.  The  total  mass  is 
conserved  at  each  step,  implying  anti-correlated  multiplicative  weights  in  each  sub-cell:  Wi+Wf  =  2,  hence 
Wi  =  l±Fi  and  Wi  ’  =  l+F,-.  ( a )  ( b ) 


Figure  A7:  Spectral  Exponents  of  Singular  and  Bounded  Cascades,  {a)  Stationary  but  intermittent  case 
H=0,  reverting  to  Meneveau  and  Sreenivasan’s  (1987)  “/?-model”  for  the  kinetic  energy  dissipation  rate 
that  occurs  at  the  Kolmogorov  scale  in  fully  developed  turbulent  flows;  this  model  is  unbounded 
(“singular”)  in  the  limit  of  many  cascade  steps.  (*)  Nonstationary  H  >Q  generalization  proposed  by 
Cahalan  et  al.  (1990,  1994)  to  model  cloud  optical  depth  variability  in  stratocumulus  decks  that  are 
observed  to  have  power-law  spectra  with  p  =  5/3. 
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where  {/?x.  Py./^d}  is  one  of  the  3!  =  6  permutations  of  {pi,p2,P3}.  with  equal 
probability  and  random  ±.  After  the  three  transfers,  the  individual  combined  weights  are 

w=  n've=  (A35) 

ee  {x,y,d}  *=I 

where  {sic\  it=l,2,3}  =  {ji,  ^2,  *53 }  =  {+,  ±,  ±},  one  of  the  8  equally  probable  combinations. 
These  weights  fall  between 

3 

W±  =  min{\y}  =  m  ±  (l-2p^]  (A36) 

h=\ 

For  this  choice  of  W’s, 

8  limbos  2 

<W«>  =  5  X  ri[l±(l-2rf.  (A37) 

{±,+,±}  fc=l 

for  q  e  3i,  hence  K(q)  from  Eq.  (A23a).  If  pi  =  p  <  \I2  and  p2=  P3  =  1/2,  we  have 
only  two  equally  probable  weights  W±,  and  Eq.  (A32)  is  retrieved. 

A.4  Going  from  Stationary  Measures  to  Nonstationary  Multifractal  Functions 

It  is  important  to  have  the  option  of  multiscaling  as  well  as  monoscaling  (section  A.2) 
in  the  realm  of  nonstationary  functions  with  stationary  increments.  We  therefore  need 
algorithms  that  generate  fields  where  structure-function  analysis  yields  a  nonlinear  t^{q), 
equivalently,  a  non-constant  H{q).  There  are  many  well-documented  methods  for 
generating  stationary  multifractal  measures  using  multiplicative  cascades;  in  contrast,  the 
literature  on  specific  ways  of  generating  nonstationary  functions  with  multiscaling 
structure  functions  is  relatively  small.  We  present  two  procedures  here  (Schertzer  and 
Lovejoy,  1987;  Cahalan  et  al,  1990)  and  refer  to  Viscek  and  Barabasi  (1991),  Ameodo  et 
al  (1992),  Benzi  et  al  (1993),  and  Sykes  et  al  (1996)  for  the  few  others  we  are  aware  of. 

A.4.1  “Bounded’*  Cascades 

One  route  from  multiplicative  cascades  to  nonstationary  multiscaling  processes  was 
charted  by  Cahalan  et  al  (1990)  for  the  purposes  of  modeling  the  internal  structure  of 
marine  stratocumulus.  Their  model  builds  on  Meneveau  and  Sreenivasan’s  (1987) 
p-model,  calling  for  a  power-law  decay  of  the  variance  of  the  weights  as  the 
multiplicative  cascade  proceeds  to  smaller  and  smaller  scales,  r,-  =  (i  =  0,...,n). 
Explicitly,  we  take 

Wi  =  l±  {\~2p)iri.i/L)-fi  (equal  Prob  ±),  0  <p  <  1/2, 0  <  77 <  ~,  (A38fl) 

W’i  =  2-Wi,  (A38/?) 

for  i  >  1.  Figure  Alb  shows  the  effect  of  77  on  the  spectral  exponent:  P  =  min  (2/7,1  }+l 
(Cahalan  et  al,  1994);  the  inset  shows  examples  for  77  =  1/3,  1.7. 
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Marshak  et  al.  (1994)  show  that  the  resulting  model, /„(jc),  is  multifractal  in  the  sense 
that  higher-order  structure  functions:  ^fn{x+r)-fn{x)\‘^)  goes  as 

H{q)  =  min  { H,  Mq } .  ( A39) 

Interestingly,  p  does  not  appear  in  Eq.  (A39)  which  determines  the  scaling  of  2-point 
statistics  of  all  orders;  Cahalan  et  al.  (1994)  investigate  its  role  in  1 -point  statistics,  hence 
prefactors  in  Eq.  (16)  of  the  main  text.  Clearly,  H  controls  the  degree  of  nonstationarity. 
The  ;?-model  is  retrieved  in  the  “stationary”  limit  H-Q.  At  i/  ^  oo,  we  find  the  “most 
nonstationary”  in  this  continuum  of  models,  Heaviside  steps  of  height  2(1 -2p)  at  x  =  Ul. 

Figure  A8  illustrates  two  important  properties  of  bounded  cascades  which  are  shared 
by  other  multiscaling  random  functions  (as  well  as  fBm’s,  their  monoscaling 
counterparts):  stochastic  continuity  and  self-affinity.  Two  identical  sequences  of 
successive  horizontal  zooms  are  illustrated;  the  difference  is  only  that  on  the  l.h.s.  the 
vertical  scale  is  held  constant  and  on  the  r.h.s  it  is  rescaled  by  a  given  factor  at  each 
zoom.  On  the  l.h.s.  we  see  the  amplitude  of  the  increments  decrease  dramatically  with 
scale  (continuity  property).  However,  on  the  r.h.s.  we  see  that  zooms  onto  different 
portions  of  the  graph  of /„(x)  are  statistically  similar:  the  graph  is  self-affine  with  fractal 
dimension  Dgraph  =  2-7fi  =  5/3  in  this  H\=H=  1/3  case. 


Figure  A8:  Stochastic  Continuity  and  Self-Affinity  of  a  ID  Bounded  Cascade  Model.  The  sequence  of 
zooms  on  the  I.h.s.  (without  vertical  rescaling)  shows  how  smaller  scales  lead  to  smaller  increments;  this  is 
called  “stochastic  continuity.”  The  r.h.s.  sequence  (with  vertical  rescaling)  yields  three  graphs  that  are 
statistically  indistinguishable  from  the  original  at  the  top;  this  property  is  “self-affinity.” 


Figure  A9  shows  an  example  of  a  2D  generalization  of  the  above  bounded  cascade 
model  starting  with  the  “/?(3)-model.”  The  factors  l-2pk  (k  =  1,2,3)  in  Eqs.  (A34-36)  are 
multiplied  by  the  variance  moderating  factor  as  in  Eq.  (38a).  This 

type  of  model  was  used  by  Cahalan  (1994),  Marshak  et  al  (1995a)  and  Davis  et  al 
{\996b)  in  radiative  transfer  computations. 
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AA.2  Random  Devil’s  Staircases 

The  simplest  way  of  obtaining  a  function  from  a  multifractal  measure  is  to  take  its 
indefinite  integral: 

X 

Mx)  =  0<x<L  (A40) 

0 

Since  £(jc)  >  0,  this  random  “Devil’s  staircase”  (Mandelbrot  1983)  is  a  non-decreasing 
function,  as  illustrated  by  the  typical  realization  based  on  a  log-normal  cascade  in  Fig. 
AlOa.  The  increments  of  (A40)  are  easily  computed,  using  the  definition  of  a  coarse¬ 
grained  measure  in  Eq.  (14)  in  the  main  body: 

x+r 

Mx+r)-f^(x)  =  |e(x  ’)(k '  =  r£(r^),  0<x<  L-r,  0<r<L.  (A4 1) 

X 

The  ^th  order  moment  of  this  increment  is  directly  related  to  that  of  E(r^): 

{\Mx+r)-Mx)\^  =  r^{e(r^)%  q  <  qu.  (A42) 

Using  the  definitions  in  Eqs.  (16)  and  (20)  and  taking  logs,  we  find  l^(q)  =  q-K{q)\ 
equivalently, 

H{q)  -  1  -  ,q<qD.  (A43) 

In  particular,  integrals  of  randomly  placed  5-functions,  with  K{q)  =  q-\  (q  >  0),  yield 
randomly  placed  Heaviside  steps  which  are  multifractal  functions  with  a  broad  range  of 
Holder  exponents:  H{q)  =  \/q  (q  >  0).  The  limit  //  ->  oo  in  Eq.  (A39)  confirms  this 
result,  that  Marshak  et  al  (1996)  derive  from  first  principles  as  well. 

A.4.3  Fractionally  Integrated  Singular  Cascades 

To  simulate  the  scale-invariant  spatial  properties  observed  in  cloud  fields,  Schertzer 
and  Lovejoy  (1987)  generalize  the  Devil’s  staircase  concept.  As  a  means  of  introducing 
the  continuity  that  necessarily  comes  with  nonstationarity,  they  simply  use  fractional 
integration  instead  of  its  standard  counterpart.  This  yields  what  shall  call  a  Fractionally 
Integrated  Singular  Cascade  (FISC).  We  use  here  the  same  form  of  fractional  integration 
as  in  §A.2.3,  low-pass  power-law  filtering  in  Fourier  space.  Consider  a  ID  grid  of  size 
Mn  -  2”  and  unitary  outer  scale, 

=y4.  m  =  0,. .  (L  =  =  1),  (A44) 

where  we  construct  an  n-step  cascade  process  with  branching  ratio  A,  =  2.  This  field  is 
generated  in  physical  space  and  its  Fourier  representation  is  computed  numerically: 

|en(^;),7  =  0,...,2«-l 

[e„(k),k  =  0,±l,...,±2«-l. 


(A45) 


cascade  process 
'Devil's  staircase' 


Figure  AlO:  Hybrid  Multiplicative/Additive  Models,  (a)  Random  Devil’s  Staircase:  both  the  measure  and 
its  integral  are  illustrated.  (A)  Same  as  in  panel  {a)  but  for  a  fractional  integration  of  order  1/3. 
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The  measure  e(jc)  is  then  “smoothed”  into  a  function  using 

[Tn(k)  =  A{H*)En{k)x\k\-fi^ 
where  (e.g.,  Gradstein  and  Ryzhik  1980) 


(A46a) 


(A46ft) 

Figure  AlOb  shows  the  outcome  for  a  log-normal  cascade  similar  to  that  in  Fig.  AlOa 
using  a  fractional  order  of  integration  H*  =  1/3.  The  exponent  H*  of  the  power-law  filter 
is  the  nonstationarity  parameter  of  the  model,  given  directly  by  the  spectral  exponent  we 
want  for/(jc): 


given  that  of  e(jc), 

Pe=l-i^(2),  (A48) 

where  K(2)  defines  the  scaling  of  (e(r;x)^>  -  As  for  fBm  using  the  Fourier 

construction  (§A.2.3),  generalization  to  2D  is  straightforward. 

We  need  to  estimate  the  scaling  exponents  for  this  model  for  ^  2  (at  q  =  2,  the 

scaling  is  determined  exactly  by  the  construction  algorithm).  For  this,  we  tentatively 
interpret  Eq.  (46a)  as 

\fix+r)-f(x)\  ~  e(rpc)  (A49) 

taking  ^th  powers,  and  averaging  yields  ^(^)  =  qH*-K{q),  given  the  exponent  definitions 
in  main  text’s  Eqs.  (16)  and  (20).  This  t^{q)  <->  K{q)  connection  is  a  special  case  of  Eq. 
(26)  with  a  =  \IH*  and  b=\,  leading  to 

H{q)  =  H*-  ,q<qB-  (A50) 

So  this  generalizes  Eq.  (A43)  for  Devil’s  staircases  where  H*  =  1. 

However,  the  general  (event-wise)  applicability  of  (A49),  hence  (A50)  for  all  values 
of  q,  to  FISCs  is  questionable.  However,  Eq.  (A50)  is  guaranteed  by  construction  to 
work  for  ^  =  0  and  q  =  2,  between  these  two  values  it  provides  at  least  a  good 
approximation  to  numerically  obtained  ^(^)’s.  As  q  increases  far  beyond  2,  we  are 
effectively  emphasizing  ever  larger  events  in  £(r;x)  on  the  r.h.s.  of  (A49).  It  is  easy  to 
imagine  situations  where  the  increment  f{x+r')-f{x)  on  the  l.h.s.  of  (A49)  is  small, 
although  the  underlying  measure  e(r;x)  is  large.  For  instance,  take  x+r  and  jc  on  either 
side  of  a  strong  spike  in  Fig.  A  10b.  Consequently,  the  agreement  between  the  statistics 
independently  determined  on  either  side  of  Eq.  (A49)  deteriorates.  ^ 


^  Davis  et  al.  (1993,  1994&)  describe  a  more  general  approach  for  characterizing  the  correlations  between 
some  nonstationary  process /(^),  artificial  or  real,  and  the  associated  £(jc)  field. 
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Table  A1  (begin):  Scale-Invariant  Processes.  2nd-order  statistics  and  associated  stationarity  and  continuity 
properties. 


Domains? 

Parameters? 

Stationarity? 

Model 

Sect. 

Figure 

xe[0,Z.)^, 

w,  /or  e 

primary 

secondary 

...  per  se^ 

(P<  1) 

...of 

(1 

incsl* 

5<3) 

exponents: 

^ _ 

1 

1 _  1 

I. 

Discrete  model  for  vvtjc)  =  fix) 

.  without  correlations: 

r 

Bernoulli  w.n. 

A. 1.1 

1 

W  =  +5 

5  >  0 

0 

V 

I 

Bm  on  a  grid 

A.1.1 

1 

f/s 

5  >  0 

2 

V 

11. 

Gaussian  model  for  w(a:)  =/Tx'i.  without  correlations: 

a 

Gaussian  w.n. 

A.1.1 

Ala 

1,2 

w  e  SR 

a>  0 

0 

V 

v/ 

~ 

A.1.1 

k\b 

1 

fe  SR 

a>0 

fo 

2 

V 

1  in.  Non-Gaussian  model  for  w(r)  = 

f'(x).  without  correlations: 

m 

Ala 

IB 

0 

0 

■1 

Levy-flight 

A.1.2 

Alb 

1 

fe^ 

0<  a<2 

_ 

2 

.±J 

TV.  Gaussian  models  forf’(jc)and  f(je),  with  correlations: 

a 

SGSN+ 

IB 

-1  <  //2  <  0 

o 

I-2I//2I 

Ml 

b 

fBm 

A.2 

A3 

IB 

/eSR 

0  <  //2  <  1 

a 

2//2+I 

■■ 

V. 

Cascade  models  e(jc) 

and  (Tjc'i.  with  microcanonical  conservation: 

p-model 

1, 

e>0 

0<p<  1/2 

l-log2[l+(l-2p)2) 

3p-model 

nil 

2 

0<pi<  1/2 

l-log2(w2). 

nn 

(/=  1,2,3) 

with  q=2  in  (A33). 

b 

Bounded 

A.4.1 

A6-8 

I, 

/-W+ 

o 

A 

a; 

P> 

min{2//,i}+l 

V 

Cascades 

A9 

2 

(0</_  < 

Pi 

/+<“) 

(i=l,2,3) 

VI.  Cascade  models  for  e(jc)  and  a 

“hybrid”  one  for  fix),  with 

canonical  conservation: 

B 

Beta-models 

A4a,  A5c 

IB 

e>0 

0  <  Df<  d 

\-(d-D() 

ira 

IDI 

B 

A4b.  A5 

IB 

E  >0 

ainw>  0 

1-Cfinw^/In2 

ira 

10 

b 

FISC* 

A.4.3 

AlOfc 

1,2 

V 

o 

//*>{! -P£)/2 
p,  Do,  or  c\nw 

2//*+Pe 

□ 

^Models  designated  with  an  “a”  are  stationary  in  the  “broad”  sense,  where  the  autocorrelation  function  depends 
only  on  the  separation:  (w(x+r)w(x))  6(r)  (I-III);  (f'ix+rY'’{x))‘^  {IV,  with-1/2  <//2  <  0);  (e(A:+r)e(.x)) 

oc  r-^(2)  (V-VI,  with  0  <  K(2)  =  1~P  <  1),  These  models  are  also  stochastically  discontinuous  since,  e.g., 
([w{jr+r)-w(x)]2)  =  0  for  r  >  0. 

^The  associated  models  designated  with  a  “fe”  are  (broad-sense)  nonstationary  but  stochastically  continuous  in 
the  sense  that,  as  r  0,  ([/■(.x+r)-y(A:)]2)  r^(2)  o  with  0  <  ^(2)  =  P-1  <  2.  In  categories  I-III,  one  can  go 

from  model  “a”  to  model  “b"  in  J  =  1  simply  by  taking  a  running  sum;  they  are  “additive”  models. 

^Scaling  Gaussian  Stationary  Noise. 

*Fractionally  Integrated  Singular  Cascade. 


Table  A1  (end):  Scale-Invariant  Processes.  Multifractal  statistics  and  associated  ergodicity  properties. 


Multi¬ 

scaling? 

C(4)& 

_ -v 

Inter- 

mittency? 

for  w,  fort  & 
...  for  IVl’s 

Bi-fractal  properties? 

(position  in  17  =  1  plane) 

Ergodicity  Properties? 

(remarks) 

1 _ 

_J _ 

//l(w,/ore)  1  C](w,/ore)  |  CKlVI’s) 

130 

1130 

fm 

0 

N.A.  (negative  values) 

trivial  ergodicity 

N.A. 

no 

1/2 

warn 

0 

large  fluctuations  in  local  1-pt.  statistics  and 
realization-to-realization  differences. 

no 

N.A. 

N.A. 

no 

IHH 

0 

10^  events  are  generally  enough  to  sample  up  to 
=3a’s  (cf.  “3a”  rule). 

130 

1130 

_ 

N.A.(  ”  ”  ) 

EH 

cf.Ifc 

Q3II 

130 

1139 

0 

N.A.(  ”  "  ) 

EH 

moments  of  order  ^  >  a  are  divergent. 

130 

\kW:M 

^91 

min{l/a,I } 

N.A.(  •’  ”  ) 

EH 

same  as  If>  and  Ub,  but  worse  due  to  divergences. 

iW 

130 

1130 

0 

N.A.(  ”  ”  ) 

EH 

cf.  Ila 

moil 

130 

130 

199 

Hi 

N.A.(  ”  ”  ) 

EH 

cf.  lib 

no 

yes 

yes 

yes 

0 

p\og2P+(  1  -;5)log2(  1  -p)+ 1 

((l/d<7)log2<W9)l^l  = 
(VVlog2VV),  from  (A33). 

Ci(E)# 

(£(rjc))  dsitribution  is  independent  of  the 
realization. 

(e(/.:0))  is  independent  of  realization. 

no 

N.A.t 

min{//,l} 

0 

N.A.t 

although  on  a  bounded  domain,  between  f±  = 
ni=^[l±(l-2p)/2'ff],  the  pdf  has  lognormal-like 
skewness  and  the  associate  sampling  problems; 
increments  are  also  broadly  distributed. 

ESI 

^9 

0 

d-Df 

no  divergence  in  {t(rjc)‘t}. 

RSI 

1^1 

i^i 

0 

(Jlnw^/21n2 

HE 

(e(r,x')9)  divergenes  for  q  >  d/Cj. 

yes 

no 

no 

yes 

H* 

0 

Cie§ 

Kiq)  for  ly/l’s:  cf.  corresponding  e-model. 

■•■In  standard  singularity  analysis  using  next-neighbor  absolute  differences,  spurious  scale-breaks  occur  due  to 
the  binary-tree  construction  of  this  model. 

taking  absolute  gradients  of  a  singular  cascade  model  only  emphasizes  the  spikes,  and  the  same  singularity 
spectrum  is  found  (Lavallde  etal,  1993). 

^Absolute  gradients  of  a  FISC  have  the  same  singularity  properties  as  the  original  cascade  (Lavall^e  et  at., 
ibid). 
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A.5  Summary 


We  have  described  specific  algorithms  for  constructing  mono-  and  multiscaling 
stationary  measures  amenable  to  singularity  analysis  (Section  4.4  in  main  text),  as  well  as 
mono-  and  multiscaling  nonstationary  functions  with  stationary  increments  amenable  to 
structure  function  analysis  (Section  4.2  in  main  body).  In  each  case,  we  have  specified 
the  dependency  of  the  appropriate  scaling  exponents  on  the  parameters  of  the  model. 
One  example  in  two  spatial  dimensions  is  described  explicitly  and,  in  all  others, 
generalization  from  ID  to  2D  is  straightforward.  This  collection  of  models  is 
comprehensive  enough  to  calibrate  and  study  the  sensitivity  of  any  standard  multifractal 
data  analysis  procedure  using  arbitrary  amounts  of  synthetic  data  with  controllable 
statistical  properties.  In  particular,  convergence  rates  of  spatial  averages  to  their 
ensemble  counterparts  can  be  investigated  since  the  models  have  variable  degrees  of 
ergodicity.  Table  A1  organizes  the  models  by  category,  lists  their  parameters  and 
summarizes  their  properties,  while  Fig,  All  displays  their  inter-relations  graphically. 

Validation  of  analysis  procedures  is  only  one  application  for  stochastic  modeling.  In 
our  specific  area  of  research,  the  effects  of  internal  cloud  structure  on  radiative  properties, 
we  have  used  multifractal  models  as  artificial  clouds-in-a-computer  with  controllable 
properties.  Extensive  numerical  experimentation,  using  Monte  Carlo  and  other  radiative 
transfer  techniques,  has  lead  us  to  new  insight  into  the  ways  clouds  affect  the  Earth’s 
radiative  budget  (Cahalan  1994)  and  ways  of  retrieving  cloud  properties  from  satellite 
(Marshak  et  al  \995a,b)  and  lidar  (Davis  et  al.  \996b)  data. 


Singularity  analysis:  K{q)  s  0 

(absolule  gradient-fields) 


■> 


K{q)*0,  q*0^ 

(measures  or  gradient-fields) 

^ - 


linear  K{q) 
q>0 


nonlinear  K{q) 

q<q^ 


Hybrid  Models 


C{<?)  =  9«2  C(q)  =  <7^  C(<7)  =  niWa,1}  =  min{gH,1}  i;,{q)  =  qhr-K{q) 

Structure 

function  - ►  ■< - ► 

analysis.  Monoscaling  Muftiacallng 


Figure  All:  Classification  of  Scale-Invariant  Models.  The  formulas  for  the  ^th-^order  structure  function 
exponents  refer  to  the  models  in  the  last  row  that  are  nonstationary  but  with  stationary  increments. 
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Two  detection  schemes  are  described.  The  first.  appUed  to  a  u'  single  hot- 
^re  signal  from  the  buffer  region  of  a  turbulent  boundary  layer  considers 
the  fluctuating  cycle  as  a  unit  to  calculate  short  time  averages.  In  the 
second,  VISA  is  applied  to  a  u'  single  hot-wire  signal  from  the  separated 
region  of  a  backward  facing  step  to  detect  the  flapping  oscillations 

1.  Introduction 


The  existence  of  coherent  structures  In  turbulent  shear  flows  has 
led  to  the  search  for  new  algorithms  for  their  detection  during  the  past 
three  decades.  The  discovery  of  coherent  structures  was  made  possible  bv 
flow  visualization  techniques  using  hydrogen  bubbles  In  a  turbulent 
water  boundary  layer  by  Kline  et  al.  (1967)  and  using  shadowgraphs  In  a 
gaseous  turbulent  mixing  layer  by  Brown  and  Roshko  (1974). 
Subsequently,  hot  wire  anemometry  has  been  employed  extensively  to 
detect  coherent  structures  using  variable  interval  time  averaging  schemes 
by  Gupta  et  al  (1971)  (viscous  sublayer  streaks),  and  BlaStwelder  and 
Kaplan  (1976)  (burst  frequency)  amongst  a  large  number  of  other 
investigations.  During  the  last  decade,  coherent  structures  have  been 
recognized  in  almost  all  type  of  turbulent  shear  flows.  An  example  is  the 
relatively  simple  geometry  of  backward  facing  step  (BFS)  flow  It  would 
appear  that  coherent  structures  constitute  an  Important  and  essential 
physical  aspect  of  all  stationary  and  nonstationary  turbulent  flows  in 
which  turbulence  production  is  present. 

The  distinguishing  features  of  coherent  structures  are:  i)  their 
localization  in  space  and  time,  and  ii)  their  relatively  high  kinetic  energy 
content  thereby  signifying  bursts  of  turbulence  or  energetic  events  as 
opposed  to  quiescent  flow  or  background  turbulence.  Conventional  long 
time  Reynolds  averaging  in  the  time  domain  and  Fourier  spectral 
^alysis  in  the  frequency  domain  have  been  unable  to  detect  them-  the 
former  due  to  masking  of  energetic  events  by  the  length  of  averaging,’  and 
the  latter  due  to  lack  of  fixed  periodicity  of  the  energetic  events. 

Attempts  have  been  made  in  the  past  to  detect  the  coherent 
structures  using  the  VITA  (Variable  Interval  Time  Averaging)  algorithm  in 
^  wavelet  transform  in  the  frequency  domain 

The  VITA  scheme  of  Blackwelder  and  Kaplan  (1976)  employs  a  fixed 
short  time  averaging  and  has  been  used  extensively  to  detect 
the  burst  frequency  in  a  turbulent  boundary  layer.  The  wavelet 
transform,  on  the  other  hand,  requires  the  specification  of  some  generic 
shape  of  the  wavelet  (Farge,  1992)  and  therefore  is  similar  to  a  pattern 
recognition  algorithm  in  the  frequency  domain. 
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The  quantification  of  relatively  high  kinetic  energy  content  in  the 
detection  criterion  continues  to  be  a  serious  problem,  although  a 
threshold  value  approximately  equal  to  the  long-time  variance  of  the 
turbulent  signal  has  been  employed  by  most  investigators. 

In  the  present  paper,  results  obtained  from  a  detection  scheme 

applied  to  a  u’  single  hot-wire  signal  from  the  buffer  region  (y+  s  J  5)  of  a 

turbulent  boundary  layer  [Re  »  2300)  is  described,  =yuj  v  and 
Rq  V,  where  y,U^,u^,e,  and  v  are  vertical  distance  from  the 

wail,  free  stream  wind  speed,  friction  velocity,  momentum  thickness,  and 
kinematic  viscosity,  respectively.  In  place  of  a  fixed  period  for  short-time 
averaging,  tlie  algorithm  considers  the  fluctuating  cycle  of  the  turbulent 
flow  signal  about  the  conventional  long-time  mean  value  as  a  unit  of 
averaging  time.  This  results  in  variable  averaging  periods  for  the 
coherent  structures.  The  duration  of  coherent  structure,  however,  is 
decided  by  using  long-time  variance  as  the  threshold  value  as  most 
investigators  have  done  in  the  past.  Some  of  these  results  were 
presented  by  Gupta  and  Kaplan  (1987). 

The  second  detection  scheme  is  variable  interval  spectral  averaging 
(VISA),  which  was  applied  to  a  u'  single  hot-wire  signal  from  the 
separated  layer  of  a  backward  facing  step  flow  to  detect  flapping 
oscillations  using  an  HP  35660A  Dynamic  Signal  Analyzer.  This  dynamic 
signal  analyzer  is  able  to  vary  the  digitizing  rate  of  the  turbulent  flow 
signal  as  well  as  to  vaiy  the  number  of  blocks  used  for  averaging  the  FFT 
spectra.  Details  of  this  investigation  are  given  by  Ananda  (1993)  and 
part  of  the  results  were  presented  by  Ananda  et  al.  (1992). 

2.  Statistical  Characteristics  of  Bursts  in  a  Turbulent  Boundary 

Layer 

Application  of  the  first  detection  scheme  to  a  block  of  single  hot¬ 
wire  data  of  2048  samples  (digitizing  rate  2500  samples  per  second)  is 
shown  in  Figure  1,  which  is  a  hard  copy  of  the  computer  video  monitor 

The  top  trace  shows  the  turbulent_signal  e(t)  with  the  horizontal 
solid  line  denoting  the  long-time  average  E  .  The  second  trace  shows  the 
plot  of  [e^Jc  /e^  for  each  successive  fluctuation  cycle.  [e^J^is  the 
short-time  veurlance  computed  for  each  fluc^atlng  cycle,  while  e  is  the 

long-time  variance.  The  solid  line  [e^Jc  /e^  =  1  indicates  the  threshold 
value  used  to  separate  energetic  burst  events  and  quiescent  events.  It 
may  be  emphasized  that  the  averaging  time  is  the  fluctuating  cycle  time 
and  is  therefore  variable  in  the  true  sense  by  being  different  for  each 
fluctuating  cycle  as  shown  in  the  second  trace  from  above  in  Figure  1. 
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Figure  1:  Detection  of  Energetic  and  Quiescent  Euents; 
y*  X  15.  Data  Length  =  2048  Samples. 

2  2 

The  third  trace  [e  ]\y  I  e  shows  the  quiescent  and  energetic 
events  alternating.  These  were  obtained  by  merging  the  adjacent 
quiescent  cycles  into  a  quiescent  event  and  adjacent  energetic  cycles  into 
an  energetic  event. 

Finally,  the  last  two  traces  ehit)  and  eqftj  show  the  splitting  of  the 
signal  e(t)  into  an  energetic  burst  signal  and  a  quiescent  signal, 
respectively.  The  numbers  1-10  on  the  trace  et)(t)  identify  the  burst 
events  1-10. 

Having  delineated  the  events  in  a  turbulent  flow  signal  it  was  now 
possible  to  compute  the  detailed  statistical  properties  of  the  energetic 
and  quiescent  events.  An  obvious  quantity  of  interest  is  the  mean 
bursting  period  T b  .  The  bursting  period  is  the  time  interval  between  two 
successive  energetic  events  eb(t).  Tb  is  obtained  from  the  histogram  of 
the  bursting  periods.  Figure  2  shows  the  histogram  of  bursting  period 
carried  out  over  the  entire  50176  samples  of  a  u'  single  hot-wire  signal 
from  the  buffer  region  of  a  turbulent  boundary  layer  developing  in  a  low 
speed  wind  tunnel  at  the  University  of  Southern  California. 
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Figure  2:  Histogram  of  Bursting  Period; 
Tb  =  74  ms;  IN  =  271 


There  were  a  total  of  271  bursting  events  with  a  mean  bursting 
period  of  Tb  =  74  ms  and  standard  deviation  equal  to  3^  ms.  In 
dimensionless  form,  the  values  of  mean  bursting  period  were  TbU^  /  d  = 
3.7  and  Tbu*  /  v  =  185,  to  be  compared  with  the  value  of  TbUoo  /  ^  = 
5.0  of  Rao  et  al.  (1971).  The  symbol  6  stands  for  boundary  layer 
thickness.  _ 

It  was  possible  to  compute  the  mean  duration,  Ti,  and  the 
standard  deviation  aj  of  both  the  energetic  and  quiescent  events 
separately.  As  shown  in  Table  1  the  mean  duration  of  energetic  events 
was  33.6  ms  compared  to  40.4  ms  for  the  quiescent  events.  Further,  the 

quantity  [e^  hy  /  for  the  energetic  events  and  quiescent  events  was 
computed  to  be  1.574  and  0.522,  respectively.  This  means  that  the  m^n 

2 

variance  of  energetic  events  is  1.574  times  the  long-time  variance  e  , 
and  that  of  the  quiescent  events  is  0.522  times  the  long-time  variance. 

The  fraction  of  total  time  y  occupied  by  the  energetic  events  was 
computed  to  be  0.454.  Thus,  the  fraction  of  total  time  occupied  by 
quiescent  events  was  J  -  0.454  =  0.546.  The  fraction  of  one  dimensional 

mean  square  turbulent  burst  energy  was  computed  as  ^  =  y  /  ib  /  . 

Similarly  p  could  be  computed  for  quiescent  events  using  the 

corresponding  values  of  y  and  lb  /  as  shown  in  Table  1.  Kim  et 
al.  (1971)  measured  p  =  0.75  and  y  =  0.57  for  the  bursting  events  in  a 
turbulent  boundary  layer  at  y+«  15,  Re  ^  650.  These  values  can  be 
compared  with  P  =0.715  and  y  =  0.454  for  the  energetic  events  in  Table  1. 


163 


Table  1;  CHARACTERISTICS  OF  ENERGETIC  AND  QUIESCENT  EVENTS 


Energetic 

Quiescent 

Tj 

33.6  ms 

40.4  ms 

20.3  ms 

24  ms 

TlU^  /  d 

1.68 

2.02 

flu*  /  V 

84 

101 

le^lb/e^ 

1.574 

0.522 

P 

0.715 

0.285 

Y 

0.454 

0.546 

The  probability  density  functions  for  the  total  signal  e(t),  the  signal 
corresponding  to  energetic  events  ei)(t),  and  the  signal  corresponding  to 
quiescent  flow  eq(t)  are  shown  in  Figure  3.  The  addition  of  22,771 
samples  of  Figure  3c  and  27,379  of  Figure  3c  results  in  a  total  50,150 
samples,  which  is  26  samples  short  of  a  total  of  50,176.  These  36 
samples  appeeu*  to  be  lost  due  to  numerical  errors.  Some  very  interesting 
observations  can  be  made  from  these  three  figures  considered  together. 


rms 


Figure  3a:  Probability  Function  of  e(t). 
Data  Set  ==  501 76  Sampies 
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P(*b) 


Figure  3b:  Probability  Function  of  eb(t). 
Data  Set  =  22771  Samples 


Figure  3c:  Probability  Function  of  eq(t). 
Data  Set  =  27379  Samples 


First,  the  probability  density  function  of  quiescent  flow  eq(t) 
modulates  the  probability  density  function  of  the  total  flow  e(t)  as  the 
shapes  of  the  curves  P(eq)  and  P(e)  are  similar.  Second,  the  probability 
density  function  of  energetic  events  P(eb)  shows  a  bi-modal  distribution. 
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The  important  things  to  note  are:  i)  the  statistical  features  of  the 
energetic  events  eb(t)  (Figure  3b)  are  completely  masked  in  the  statistical 
features  of  the  total  flow  e(t)  (Figure  3a);  and  ii)  the  probability  density 
function  P(eq)  of  the  quiescent  events  is  not  Gaussian  and  reflects  the 
probability  density  function  P(e)  of  the  total  turbulent  fluctuations. 

It  was  possible  to  further  subdivide  the  energetic  events  into 
ejection-like  events  (u'  >  0,  v'  >  0)  and  sweep-like  events  (u'  >  0,  v'  <  0), 
For  the  present  single  hot-wire  data  set  this  subdivision  was  carried  out 
by  computing  the  short-time  average  value  of  each  energetic  event  let]. 
Short-time  average  of  energetic  events  with  [eb]  <0  was  taken  as 
indicating  the  ejection  event,  and  [eb]  >0  as  the  sweep  event.  Figure  4 
shows  this  detection  algorithm  results,  where  the  traces  corresponding  to 
ebjit)  and  ejjsit)  show  the  ejection  and  sweep  events,  respectively. 
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Figure  4:  Detection  of  Ejection  and  Suieep  Euents; 
y*  ~  15.  Data  Set  =  2048  Samples 

The  traces  corresponding  to  ebj(t)  and  ebs(t)  indicate  that  ejection 
and  sweep- like  events  at  a  point  in  the  flow  do  not  always  alternate. 
There  are  times  when  there  appears  at  a  point  two  ejection  events  in 
succession.  This  result  of  the  detection  algorithm  is  in  agreement  with 
flow  visualization  observations  and  also  makes  sense  physically  because 
the  occurrence  of  energetic  events  is  not  local  in  space.  There  is  a 
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certain  volume  which  each  energetic  event  occupies  as  it  occurs  and 
moves  downstream. 


Table  2:  CHARACTERISTICS  OF  EJECTION  AND  SWEEP-LIKE  ENERGETIC 

EVENTS  _ _ 


Ejection 

Sweep 

Energetic 
(Table  1) 

Ti 

32.8  ms 

34  ms 

33.6  ms 

fiu^  /  6 

1.64 

1.70 

1.68 

fiuf  /  V 

82 

85 

84 

1.726 

1.386 

1.574 

P 

0.434 

0.281 

0.715 

y 

0.251 

0.203 

0.454 

Table  2  presents  the  statistical  mean  characteristics  of  ejection¬ 
like  and  sweep-like  events.  Out  of  a  total  of  271  energetic  events  for  the 
present  data  set.  there  were  152  ejection-like  events  and  110  sweep-like 
events. 

The  probability  density  functions  of  the  ejection-like  events  and 
sweep-like  events  are  presented  in  Figure  5  which  shows  some 
remarkably  distinct  statistical  properties  of  ejection  and  sweep  events. 
While  both  P(etj)  and  P(ebs)  are  highly  skewed,  the  ejection  events  are 
positively  skewed,  and  sweep  events  are  negatively  skewed. 


Figure  5a:  Probability  Function  of  ebj(t). 
Data  Set  =  12604  Samples. 
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Figure  5b:  Probability  Function  of  eijs(t). 
Data  Set  ==  1 01 67  Samples. 


Several  other  statistical  characteristics  of  energetic  and  quiescent  events 
as  well  as  their  spectral  characteristics  were  also  computed.  Similar 
computations  need  to  be  carried  out  across  the  entire  thickness  of  the 
turbulent  boundary  layer  to  establish  the  detection  algorithm  on  firm 
ground. 

3.  Variable  Interval  Spectral  Averaging  (VISA)  in  a  Backward  Facing 

Step  Flow. 

The  HP  3566 OA  dynamic  signal  analyzer  digitizes  a  time  record  of 
1024  samples  for  one  channel  to  generate  512  points  in  the  frequency 
domain  by  Fast  Fourier  Transform  (FFT)  and  displays  the  first  401  points 
while  discarding  the  rest.  As  many  as  20  different  frequency  spans 
(digitization  rates)  ranging  from  0.2  Hz  (1  sample  every  2  sec)  to  102.4 
KHz  (262144  samples  per  second)  are  available.  Furthermore,  the 
number  of  time  records  (NTR)  used  for  averaging  of  spectra  can  be  chosen 
from  1  to  99999  time  records  and  displayed  on  the  monitor  as  well  as 
stored  on  a  floppy  disk  either  sequentially  with  a  constant  number  of 
time  records  or  cumulatively  with  number  of  time  records  increasing  in  a 
cumulative  manner.  The  contiguity  is  satisfactory  for  record  lengths 
larger  than  the  FFT  processing  time.  A  typical  real  time  bandwidth  for 
one  channel  mode  is  a  frequency  span  of  800  Hz  with  averaging  off  and  a 
frequency  span  of  3.2  KHz  with  fast  averaging.  All  these  characteristics 
are  taken  from  the  operating  manual  of  HP  3566 OA  dynamic  signal 
Emalyzer. 

The  experimental  conditions  of  test  were  a  backward  facing  step  of 
25.4  mm  height  (h)  formed  on  the  floor  of  a  low  speed  wind  tunnel  of  test 
section  height  of  304.8  mm  and  width  of  405  mm.  The  free  stream  wind 
speed  was  2  m/s  giving  Reh  =  Uh/v  s2940  with  laminar  separation  and 
turbulent  reattachment.  A  single  hot-wire  was  located  in  the 
recirculation  region  at  x/h  ~  5.5  and  y/h  =  0.5.  The  signal  from  DISA 
55A01  constant  temperature  anemometer  was  fed  in  parallel  to  a  Kikusui 
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storage  oscilloscope  DSS  5020A  and  an  HP  35660A  Dynamic  Signal 
Analyzer  and  plots  were  made  on  a  HP  Color  Pro  Plotter. 

Since  the  objective  was  to  detect  low  frequency  oscillations,  the 
selection  of  frequency  span  was  the  first  step.  By  looking  at  the  time 
traces  and  power  spectra  for  NTR  =  1  and  NTR  =  250  at  frequency  spans 
of  25  Hz,  200  Hz,  1.6  KHz  and  3.2  KHz,  it  was  decided  that  a  frequency 
span  of  200  Hz  was  most  suitable  for  the  present  analysis.  Figure  6 
shows  a  t5q)ical  time  trace  and  power  spectra  for  NTR  =1,5  and  250  for 
the  200  Hz  frequency  span. 


Figure  6:  R  Time  Trace  and  PoLuer  Spectra  for 
2BB  Hz  Frequency  Span;  NTR  =  1,5,25B. 
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Cumulative  short-time  average  power  spectra  for  NTR  =  5,10,20 
and  50  are  presented  in  Figure  7  to  show  the  effect  of  averaging  on  the 
shape  of  the  power  spectral  curve.  As  is  to  be  expected,  the  increasing 
NTR  smoothens  the  spectral  curve  without  displaying  conspicuous  peaks 
due  to  averaging. 


_> 

■o 


(Hz) 


Figure  7:  Cumulatiue  UlSR  for  NTR  =  5,10,20,  and  50. 
Frequency  Span  =  200  Hz. 


Figure  8  shows  power  spectra  of  four  contiguous  time  records  each 
of  NTR  =  5.  It  may  be  observed  that  low  frequency  peaks  are  clearly 
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discernible  but  there  is  no  repeatability  of  peaks  between  any  two 
spectra.  Most  of  the  dominant  peaks  are  observed  below  10  Hz. 
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Figure  8:  Typical  UISfl;  NTR  =  5.  Frequency  Span  =  200  Hz. 

Stepwise  and  cumulative  energy  curves  averaged  over  10  records  of 
NTR  =  5  in  10  Hz  Intervals  are  shown  in  Figure  9.  This  indicates  that 
one  third  of  the  total  energy  is  in  the  low  frequency  range  of  0-10  Hz. 
The  value  of  10  Hz  corresponds  to  Stfi  =  0.127  and  Ste  =  0.006,  where 
Stf^  =nh  /  U 00,  St0  =nd  /  Uoo,  h  =  2.5  cm  is  the  step  height  and  n  is  the 
frequency  in  Hz. 
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Figure  9:  Stepuiise  and  Cumulatiue  flueraged 
Energy  Distribution;  NTR  =  5.  Frequency 
Span  =  200  Hz. 

In  the  absence  of  repeatability  of  peaks,  even  in  0-10  Hz  for  the 
short-time  averaged  spectra  (NTR  =  5),  a  histogram  of  peak  frequencies  in 
the  0-10  Hz  range  was  made  for  50  records  of  short-time  averaged  spectra 
(NTR  =  5).  This  histogram  is  shown  in  Figure  10.  No  single  dominant 
frequency  is  apparent  from  this  histogram.  Instead,  a  group  of 
frequencies  is  present  more  often  than  others.  For  the  present  case  they 
are:  frequencies  of  2  Hz,  4  Hz,  and  1  Hz,  in  the  descending  order  of  the 
histogram  values. 


Figure  10:  Frequency  Histogram  of  Peaks;  NTR=5. 
Frequency  Span  =  200  Hz. 
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4.  The  Structure  of  Turbulence 

The  experimental  investigations  reported  in  this  paper  are 
concerned  with  the  structure  of  turbulence  in  turbulent  shear  flows. 
One  of  the  turbulent  shear  flows  considered  is  the  turbulent  boundary 
layer  while  the  other  is  the  turbulent  separated  flow.  The  objective  has 
been  to  detect  the  coherent  structures  in  the  two  cases:  the  burst  by 
means  of  VITA  algorithm,  and  the  flapping  oscillations  by  means  of  VISA 
algorithm. 

Bursts  are  delineated  by  defining  a  cycle  of  fluctuation  (about  the 
conventional  long-time  mean)  as  a  unit  of  short-time  averaging.  This 
makes  the  short-time  averaging  period  a  truly  variable  quantity,  unlike 
the  fixed  short-time  averaging  period  employed  by  most  investigators 
with  some  amount  of  arbitrariness  in  selecting  the  fixed  averaging  time. 
A  fluctuating  cycle  (about  the  conventional  long-time  average)  can  be 
regarded  to  represent  an  eddy,  and  a  coherent  structure  can  be  regarded 
as  made  up  of  contiguous  eddies.  So  far  the  definition  of  a  turbulent 
eddy  has  been  an  enigma  except  for  its  identification  with  a  scale  of 
motion,  and  the  belief  that  a  turbulent  eddy  is  not  a  wave  (possibly  a 
wave  packet). 

The  detection  of  flapping  oscillations  in  the  separated  turbulent 
flow  on  a  backward  facing  step  by  meams  of  short-time  averaging  applied 
in  the  frequency  domain  implies  the  identification  of  turbulent  separated 
flow  with  the  coherent  structures.  Most  of  the  previous  attempts  to 
detect  the  flapping  oscillations  have  been  made  by  focusing  attention  in 
the  free  shear  layer  region  of  BFS  flow.  In  the  present  case,  the  single 
hot-wire  is  located  at  y/h  =  0.5  in  the  recirculating  region,  and  the 
flapping  oscillations  are  regarded  as  due  to  overall  coherent  structure  in 
the  BFS  cavity  region. 

5.  Conclusions 

Two  short-time  averaging  schemes  applied  to  single  hot-wire  data 
are  presented. 

The  first  scheme  employs  a  fluctuating  cycle  of  flow  oscillation 
about  the  mean  as  a  unit  of  short-time  averaging.  With  a  threshold 
value  chosen  as  the  long  time  variance,  the  scheme  is  able  to  delineate 
the  bursts  and  provide  values  for  the  burst  frequency,  intermittency,  and 
other  statistical  characteristics  of  energetic  events  and  quiescent  flow  at 
i/+  =  J  5  in  a  turbulent  boundary  layer. 

The  second  scheme  considers  the  power  spectra  averaged  over  a  few 
records  for  a  frequency  span  of  200  Hz  in  the  recirculation  region  of  a 
backward  facing  step  airflow  at  Reh  =  2940,  and  x/h  =  5.5,  y/h  =  0.5  to 
detect  the  flapping  oscillations.  A  group  of  frequencies  of  2  Hz,  4  Hz  and 
1  Hz  in  the  descending  order  of  the  histogram  values  is  observed. 
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SCALE-INVARIANT  FORMULATION  OF  NONSTATIONARY  SIGNALS 
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A  scale-invariant  formulation  of  nonstatlonaiy  signals  is  posed.  The 
formulation  engenders  from  the  multiplicative  group  composition  law  of 
algebra  rather  than  from  the  standard  additive  law.  In  particular,  the 
explicit  times  at  which  signal  values  are  correlated  are  converted  into  the 
principle  coordinates  of  the  formulation  according  to  a  nonlinear 
transformation  which  features  the  nonstatlonaiy  character  of  the  signal 
as  a  function  of  the  geometric  mean  of  the  times  rather  than  the 
standard  arithmetic  mean.  Traditional  time-lags  are  accordingly 
converted  into  time  ratios.  The  advantages  of  this  approach  are 
discussed,  particularly  the  related  absence  of  correlation  scales  as  well  as 
the  nonmetric  aspects  of  the  coordinate  transformation  defining  the 
principal  coordinates.  Simple  properties  of  second-order  and  third-order 
correlations  are  highlighted.  A  spectral  version  of  the  formulation  is  also 
developed,  and  it  is  shown  that  the  Mellin  transform  is  the  transform  best 
suited  to  this  formulation. 

1.  Introduction 

From  the  simplest  perspective  of  engineering  and/or  physics,  a 
nonstationary  random  signal  is  one  in  which  at  least  one  of  its 
measurable  statistics  (e.g.  mode,  median,  mean,  rms,  skew,  etc.)  is 
dependent  upon  the  time-scale,  say  T,  available  for  observation  of  the 
signal  as  well  as  on  the  initial  time  t  of  the  observation.  If  the 
phenomenon  under  investigation  is  nonlinear,  the  stated  t,T-dependence 
of  one  statistic  t5rpically  guarantees  a  likewise  dependence  of  another, 
and  so  on.  In  this  respect  nonstatlonaiy  signals  are  different  from 
stationary  signals  in  that  for  a  stationary  signal,  if  T  is  "long  enough," 
all  of  its  measurable  statistics  are  independent  of  both  t  and  T. 
Occasionally,  one  may  encounter  a  nonstationary  signal  whose 
measurable  statistics  are  independent  of  either  t  or  T,  but  such  is  the 
exception  rather  than  the  norm.  There  are  some  underlying  features  of 
nonstationary  signals  which  are  independent  of  both  t  an  T,  but  these 
are  not  obvious,  and  exist  largely  in  idealized  cases.  The  present 
communication  investigates  one  such  case. 

The  theory  of  nonstationary  random  signals  unfortunately  still  lacks 
firm  and  solid  foundations.  From  a  practical  perspective,  though,  it  is 
imprudent  to  procrastinate  until  such  foundations  are  installed  before 
applied  analyses  are  attempted.  Traditionally,  the  hesitation  which  the 
practical  analyst  expresses  toward  enjoining  the  problem(s)  associated 
with  nonstationary  phenomena  arises  from  the  virtual  absence  of  the 
appropriate  necessary  tools  and/or  methodologies  for,  and  this  next  part 
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is  critical,  describing  and  analyzing  such  phenomena  in  a  mathematically 
simple  way.  The  recent  advent  of  the  theory  of  wavelets  has  enormously 
stimulated  the  interest  of  the  signal  processing  community  with  respect 
to  nonstationary  analysis,  and  resulted  in  a  virtual  explosion  in  the 
number  of  related  technical  papers  currently  appearing  in  professional 
journals.  In  particular,  the  oft-stated  contention  that  wavelets  are  more 
suited  than  are  Fourier  or  other  classical  methods  to  the  analysis  and 
decomposition  of  nonstationary  random  processes  has  aroused  the 
curiosity  of  more  than  one  investigator  and  his  (her)  concomitant 
funding  source.  Wavelets  have  indeed  on  many  instances  successfully 
been  applied  to  the  analysis  of  nonstationary  signals;  but  it  appears,  at 
least  at  this  point  in  time,  in  a  way  which  does  not  readily  lend  itself  to 
the  extraction  from  the  results  of  such  analyses  those  parameters  that 
allow  the  analyst  to  unilaterally  quantify,  in  a  practical  and  meaningful 
way,  the  explicit  nature  of  the  attendant  nonstationaiy  behavior.  Recall 
that  from  an  applied  science  approach,  the  most  fundamental  intent  of 
signal  analysis  is  to  determine  from  same  a  sufficient  number  of  the 
defining  statistics  of  the  signal  to  effect  some  design-related  decision. 
These  statistics,  keep  in  mind,  are  in  most  cases  time-varying  and  scale- 
dependent.  Quantifications  of  this  nature,  economic  in  the  sense  that 
they  reduce  the  many  to  the  few  rather  than  increase  the  few  to  the 
many,  as  wavelet  analysis  seems  to  do,  are  the  ones  which  in  practical 
endeavors  are  typically  held  in  highest  regard.  Wavelets  and  wavelet 
analysis,  though,  should  eventually  find  their  proper  place  in  the 
literature  as  an  advanced  tool  for  signal  analysis  provided  it  can  be 
established  which  wavelets  are  best  suited  to  the  various  types  of 
nonstationarities  which  manifest  in  everyday  scenarios.  Elsewhere  in 
this  volume  (Andreas  and  Trevino,  1996)  is  presented  a  formulation 
consistent  with  this  intent. 

To  somewhat  alleviate  the  current  state  of  affairs,  then,  the  attention 
of  this  communication  is  directed  to  the  study  of  the  correlation 
structure  of  a  certain  fundamental  class  of  nonstationary  signals.  One 
whose  related  characteristics  are  broad,  but  whose  defining  structure  can 
nonetheless  be  simply  modeled.  The  motivation  is  supplied  by  the  belief 
that  understanding  a  given  phenomenon  is  always  preceded  by  a  simple 
and  concise  elucidation  of  same,  coupled  eventually  with  careful 
exploration,  and  exploitation,  of  its  consequences.  Although  not 
suggested  by  the  title,  the  intended  class  is  the  by-product  of  what  could 
be  termed  "the  natural  pursuit  of  scaling,"  though  perhaps  a  more 
appropriate  designation  is  the  "natural  pursuit  of  re-scaling."  The 
objective  is  to  fully  understand  the  analytical  consequences  of  its 
particularly  simple  features,  especially  the  imposed  symmetry,  and 
simultaneously  evoke  a  concept  of  nonstationaiy  behavior  which  is  in 
some  sense  positive,  with  an  accompanying  degree  of  determinism. 


177 


2.  The  Nature  of  Nonstationarity 

As  discussed  above,  nonstationaiy  random  behavior  is  behavior  some 
or  all  of  whose  defining  statistics  are  explicitly  time- dependent.  "Time- 
dependent,”  again,  means  time -dependent  in  the  sense  of  "instantaneous 
time"  and  in  the  sense  of  "time-scale.".  Specifically,  if  the  histogram  of  a 
nonstationary  random  signal  is  tabulated  for  the  total  number  of  data 
values  occurring  over  a  time  interval  defined  by  [t,  t+T],  the  defining 
statistics  of  the  said  histogram  will  vary  with  both  t  and  T.  It  is 
assumed,  by  the  way,  that  the  measured  signal  is  of  some  continuous 
ongoing  physical  phenomenon  which  essentially  has  no  beginning  and 
no  end.  Note  that  for  a  stationaiy  process  if  T  is  "long  enough,"  the 
shape  of  the  histogram  is  independent  of  t  and  only  the  "number  of 
occurrences"  increases  with  increasing  T.  That  is,  the  graph  of  number 
of  occurrences  vs  signal  values  is  only  magnified  along  the  ordinate  by  a 
factor  which  depends  upon  the  increase  in  T  only,  and  not  on  the  specific 
value  of  the  signal.  Actually,  such  a  dependence  with  increasing  T  in  the 
shape  of  the  histogram  is  perhaps  more  definitive  of  an  ergodic  process 
tham  it  is  of  a  stationaiy  process,  but  in  everyday  practice  ergodicity  and 
stationarity  are  invoked  Interchangeably.  At  once,  then,  the  "two- 
dimensional"  nature  of  nonstationaiy  behavior  becomes  obvious.  All 
physically  meaningful  statistical  characteristics  of  the  behavior  depend 
on  both  instantaneous  time  and  total  time  of  measurement /observation. 
A  stationary  process,  on  the  other  hand,  is  zero  dimensional  in  that  if 
the  measurement  time  is  long  enough,  all  meaningful  statistical 
characteristics  of  the  behavior  are  independent  of  both  time  and  scale. 
The  histogram,  recall,  is  a  depiction  of  the  number  of  relative 
occurrences  of  the  the  signal  values  over  one  finite  event  of  the  process. 

Nonstationaiy  behavior  does  not  readily  reveal  what  could  be 
designated  as  "universal  structure,"  and  is  accordingly  not  what  would 
normally  be  considered  an  idealized  type  of  behavior.  In  the  classical 
formulation  of  the  multi- time  correlations  peculiar  to  nonstationaiy 
analysis,  one  of  the  more  elementaiy  decisions  an  analyst  must  make  in 
order  to  quantify  the  temporal  evolution(s)  of  same  is  the  choice  of 
appropriate  time  scales  with  which  to  nondimensionalize  the  various 
time  lags  that  typically  appear  in  the  correlations.  Time  scales  for  the 
temporal  evolutions  of  the  rms,  skew,  kurtosis,  etc.  are  typically  defined 
by  the  process  itself,  and  in  this  sense  are  natural  to  the  phenomenon. 
The  analyst  is  (usually)  not  at  liberty  to  modify  these.  Multi- time 
correlations,  recall,  are  integral  inner-products  which  the  analyst  uses  to 
establish  a  relationship  between  the  behavior  of  a  random  signal  at  one 
time  instant  with  that  at  another.  The  proper  choice  of  time  scale  for 
time-lag  nondimenslonalization,  unfortunately,  has  never  been  fully 
addressed  in  the  literature,  and  consequently  a  veiriety  of  differing  time 
scales  have  historically  been  Introduced  to  satisfy  peirticular  conditions 
that  arise  in  specific  analyses.  One  such  scale  is  defined  as 
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77ft,  t)  =  - 
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while  a  second  is 

00 

A(t)  =  fR(t,T)dT. 
0 


(1) 


(2) 


An  interesting  feature  of  the  time  scale  defined  by  Equ.  (1)  is  its  t- 
dependence,  which  essentially  allows  the  "larger"  scale  behavior  of  the 
considered  signal  to  be  scaled  in  a  manner  different  from  the  "smaller" 
scale  behavior  (~  "scale-dependent"  scaling,  sometimes  denoted 
"generalized"  self- similarity).  In  both  of  these  definitions  R( )  is  the  two- 
time  correlation  of  the  given  random  signal  normalized  by  the  square  of 
its  rms.  Scales  peculiar  to  third,  or  even  higher,  -order,  correlations  can 
also  be  defined,  as  can  the  appropriate  normalization  factors.  In  the 
analysis  of  "turbulent"  fluid  phenomena,  other  time  dependent  scales 
such  as  the  Kolmogorov  scale  and  the  dissipation  scale  are  not 
uncommon.  The  dissipation  scale  of  turbulence  is  analogous  to  the 
scale  defined  in  Equ.(l)  evaluated,  though,  at  t  =  0,  while  the 
Kolmogorov  scale  is  that  scale  which  makes  the  turbulence  Re 
approximtely  equal  to  unity. 

The  particular  choice  of  time  scale  is  important  because  it  is  felt  by 
many  investigators  that  when  random  signals  are  "properly"  scaled,  they 
reveal  features  which  are  independent  of  how  the  signal  is  physically 
measured.  Features  which  are  independent  of  how  a  signal  is  measured, 
one  might  suspect,  suggest  a  universality  of  sorts,  which  in  turn  suggests 
properties  that  are  manifestations  strictly  of  the  mechanisms  of  the 
random  behavior  itself  and  not  of  any  physical  mechanlsm(s)  which 
create(s)  the  behavior.  A  sort  of  "self-governance",  if  you  will,  much  like 
the  thermal  efficiency  of  a  Carnot  cycle  in  thermodynamics  which,  as  is 
well  known,  is  independent  of  the  working  fluid.  The  isolation  of  such 
autonomy  in  random  behavior  is  potentially  the  beginning  of  the 
formulation  of  any  law,  or  laws,  that  define  or  describe  the  related 
underlying  physics. 

The  need  for  a  time  scale  in  the  classical  formulation  of  nonstationairy 
random  signals  arises  because  it  is  both  natural  and  convenient  to 
model  the  temporal  evolution  of,  say,  the  standard  two-time  correlation, 
Q(tl,t2)  =  <x(ti}3dt2)>,  of  such  a  signal  in  terms  of  a  time-dependent  time 
scale  as  Cit,T)  »  o^it)R{T/L}  where,  t  =  {ti-¥t2)/2,  t  =  it2-ti),  and  h  is  the 
chosen  time-scale  which  is  in  some  sense  characteristic  of  the  second- 
order  correlation  structure  of  the  signal,  and  almost  invariably  varies 
with  t.  Consistent  with  accepted  notation,  angular  brackets,  <  >,  denote 
ensemble  averaging  and  oil)  denotes  the  time -dependent  rms  of  the 
signal.  This  approximation  is  equivalent  to  writing  the  measured 
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random  signal  itself  as  the  product  x(t)  =  o(t)^t),  and  ultimately  in  fully 
nondimensionalized  form  as  x(t)  =  o(t)^{t/L}.  The  premier  feature  of  this 
latter  expression  is  the  roles  of  both  a  and  L  as  time  dependent  scaling 
pcLTcuneters.  The  t-dependence  ofL,  again,  is  required  because  classical 
nonstationarity  in  its  most  complete  form  demands  that  the  correlation 
structure  (~  "degree  of  randomness",  frequency  content,  etc,)  of  the 
normEdized  random  process  ^t)  also  vaiy  with  time.  In  other  words,  there 
is  no  guarantee  that  for  purposes  of  second-order  correlation  analyses, 
the  complete  averaged  time-behavior  of  x(t)  is  captured  by  o(t)  only.  For 
third-order  correlations  the  signal  is  Instead  decomposed  as  x(t)  - 
v(t}^{t/M)  where  v  is  the  skew  of  the  process  and  M  is  a  time-dependent 
time  scale  in  some  sense  characteristic  of  ). 

3.  Examples  of  Nonstationary  Behavior 

Perhaps  the  simplest  example  of  nonstationary  behavior  is  that  of 
observing  and  measuring  over  a  short  time-interval  one  member  of  a 
random  process,  say  s(t),  which  is  known  to  be  stationary.  Denoting  the 
measurement  time-interval  as  T,  the  recorded  signal  is  thus  x(t)  =  [U(t)  - 
U(t-T)Js(t),  where  U(t)  is  the  unit-step  function.  The  two-time  correlation 
function  of  the  recorded  signal  is  accordingly 

< x(ti)x(t2)  >=<  x(t  -  T  f  2)x(t  +  r  1 2)  >^Cx(t,T)  = 

{U(t-tf2)  -U(t-T  ■hr/2)]Cs(T),  (3) 

where  in  a  practical  scenario  the  correlation  of  particular  Interest  is 
invariably  C^z)  =  <s(t-T/2)s(t+T/2)>  rather  than  the  computed  Cxft.r). 
Recall  that  in  practice,  <x(t-zl2)x(t+r/2)>  is  commonly  approximated  as 

q(T,T)  =  fx(t  ~z  /  2)x(t  +  z/  2)dt.  (4) 

0 

The  situation  at  hand,  then,  is  clearly  a  special  case  of  the  general 
problem,  x(t)  =w(t)^t)  where  w(t)  is  a  known  window  function  typically  of 
some  windowing  scale,  T.  A  straightforward  computation  for  the  general 
case  produces  Cx(t,r)  -  w(t-T/ 2)w(t+zl 2)Cs(z),  indicating  that  the  two-time 
correlation  of  the  recorded  signal  is  unavoidably  a  t-  and  r-dependent 
modulation  of  C^r).  For  values  of  t  small  in  comparison  to  T,  say 
roughly  (t/T)  <0.1,  the  difference  between  Cx(t,r)  and  Cs(t)  is  minimal, 
but  for  correspondingly  large  values  of  r  the  difference  can  be 
substantial.  The  exact  net  effect,  though,  is  best  analyzed  by  means  of 
frequency  analysis.  A  very  fundamental  "hardware -oriented"  analysis  of 
how  the  frequency  content  of  a  short-lived  signal  is  distorted  by  a 
tremsient  recorder  is  presented  by  Leighton  (1988), 
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Another  example  of  a  nonstationary  process  is  a  phenomenon  known  as 
1  /J  noise.  The  accepted  definition  of  1  //  noise  is  noise  whose  power 

spectrum  is  of  the  form  « f  constant )  /  J  ,  where  0  <y  <2 

and /  denotes  circular  frequency;  invariably,  though,  y  =  i .  Because  the 
measured  signal,  say  X(t),  is  nonstationary,  the  term  "power  spectrum"  is 
invoked  here  in  a  rather  "loose"  sense,  and  what  is  really  intended  by 
this  designation  is  instead  the  absolute  value  of  the  periodogram, 

Zi  SX(t)exp( -i(ot)dt ,  (5) 

0 

where  co  is  the  angular  frequency,  equal  to  2jf.  The  "power  spectrum"  of 

the  signal  is  then  found  according  to  (p(J)~^(Q},T)=^Z(w,T)Z*(a},T). 

Lastly,  an  indeed  veiy  interesting  example  of  nonstationaiy  behavior  is 
the  case  of  turbulence  generated  by  a  grid  in  a  wind  or  water  tunnel,  or 
some  other  such  mechanism,  which  then  decays  as  it  is  advected  by  the 
mean  flow  away  (downstream)  from  turbulence  generating  source.  Such 
turbulence  is  usually  denoted  as  nonhomogeneous  steady-state  turbulence 
because  its  statistical  characteristics  vary  with  spatial  position  and  not 
with  time;  but  a  t-dependence  can  be  introduced  into  the  behavior  by 
invoking  the  "frozen"  turbulence  assumption,  sometimes  known  as 
Taylor's  h5q)othesis,  allowing  downstream  distance  from  the  source  to  be 
approximated  as  Ut ,  where  U  denotes  the  mean  flow.  In  this  way, 
spatial  variations  in  the  stochastic  averages  of  the  turbulence  can  be 
instead  modeled  in  terms  of  analogous  time-dependencies. 

4.  Invariance  in  Nonstationary  Phenomena 

The  concepts  of  invariance  (Salmon,  1885;  Grace  and  Young,  1903; 
Gurevich,  1964)  have  quietly  been  applied  to  the  analysis  of  random 
processes  since  the  late  thirties.  Publishing  in  the  1904-05  issue  of  the 
Hibbert  Journal,  C.  J.  Keyser  describes  invariance  as  "changelessness  in 
the  midst  of  change,  abiding  things  in  a  world  of  flux,  configurations 
that  remain  the  same  despite  the  swirl  and  stress  of  countless  hosts  of 
curious  transformations."  The  importance  of  invariance  has  already 
proved  itself  in  such  diverse  fields  as  algebra,  geometry,  theoretical 
physics,  and  psychology,  and  its  use  as  a  bonafide  tool  of  thought  is 
exemplified  by  the  classical  notion  of  conservation  of  energy.  In  fluid 
mechanics  this  law  is  accompanied  by  conservation  of  mass, 
conservation  of  linear  momentum,  and  conservation  of  angular 
momentum.  Scientists,  in  their  unending  quest  for  knowledge,  typically 
search  for  invariant  properties  whether  they  know  it  or  not,  such  being 
simply  a  manifestation  of  the  natural  aspiration  for  generality.  In 
practically  all  sciences  the  simplest  principles  that  simultaneously  have 
the  broadest  application  are  the  ones  which  are  most  valued.  Invariance 
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is  most  subtle  in  that  it  asserts  that  the  substantive  laws  of  a  given  field 
should  be  expressible  in  a  manner  which  is  strictly  Independent  of  the 
particular  scales  chosen  to  measure  the  related  field  variables.  The 
Hibbert  Journal,  by  the  way,  is  a  "quarterly  review  of  religion,  philosophy, 
and  theology;"  it  was  discontinued  in  1968. 

The  existence  of  Invariant  structure  in  random  behavior  has  been 
suspected  for  some  time.  In  fact,  invariance  underlies  and  is  intimately 
connected  to  the  veiy  definition  of  stattonarity  itself.  To  be  sure,  a 
stationary  random  signal  is  defined  as  one  whose  probability  density 
function  (pdf)  is  independent  of  arbitrary  translations  in  time,  and 
therefore  one  whose  statistical  structure  is  independent  of  the  particular 
choice  of  reference  origin.  This  well-known  translation  invariant 
formulation  of  random  signal  analysis  has  its  roots  in  the  principle  of 
theory  construction  that  since  the  laws  of  mechanics  are  invariant  under 
uniform  translations  (and  rotations)  of  the  underlying  coordinate 
system,  the  phenomenon  under  investigation  should  likewise  reveal 
similar  features.  A  transformation  of  the  type,  t'  =  t+s  where  £  is  a 
constant,  is  perhaps  the  most  fundamental  of  those  transformations 
which  reflect  this  property,  but  a  transformation  of  the  type  t’  =  At+e 
where  A  is  a  constant  also  reflects  the  translation  invariant  (but  scale- 
dependent)  property.  Translation  Invariance  implies  that  the  very  idea  of 
"origin"  is  an  illusion,  acknowledging  only  a  "before"  and  an  "after,"  but 
not  when  this  distinction  occurs.  Nonstationaiy  processes,  however,  are 
not  invariant  with  respect  to  these  types  of  transformations,  and  the 
time-dependencies  of  cr  and  L  indicated  above  clearly  establish  that  as 
time  evolves,  C(t+At,T)  C(t,T).  This  accordingly  suggests  that 
nonstationaiy  processes  are  perhaps  invariant  under  a  more  general  type 
of  transformation. 

.  In  order  to  Introduce,  then,  the  concept(s)  of  invariance  into  the 
analysis  of  nonstationary  signals,  it  is  necessary  to  invoke  a  more 
general  type  of  invariance;  the  intended  invariance  is  scale  invariance. 
The  ensuing  formulation,  in  a  somewhat  less  restrictive  form,  is  related 
to  the  formulation  of  Gray  and  Zhang  (1988). 

5.  Scale-invariant  Formulation 

Consider  a  nonstationary  random  signal,  denoted  by  x(t),  whose  n-th 
order  probability  density  function  (pdf)  is  p{x(ti),x(t2),  x(t3),...,x(t„)}.  For 
purposes  of  first-  and  second-order  moment  analysis  only,  suppose  that 
p{x(t)}  =  p{x(At)},  and  p{x(t}),x(t2)}  =  p{x(Ati),x(At2)},  where  A  is  potentially  a 
time-dependent  scale  change.  Since  for  this  type  of  transformation  there 
is  no  invariant  in  the  instantaneous  time  t,  it  follows  that  the  mean  of 
such  a  process,  in  order  to  satisfy  the  required  invariance,  must  be  a 
constant,  which  for  practical  purposes  can  be  taken  to  be  zero.  When  A 
is  a  constant,  though,  the  two-time  correlation  is  a  function  of  a  =  t2/ti 
only,  since  this  is  the  only  invariant  in  two  times  under  this  type  of  scale 
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change  transformation.  Thus  for  such  a  "scale -invariant  process,"  <x(t)> 
=  0  and  <x(ti)x(t2)>  =  9(a).  This  is  the  result  of  Gray  and  Zhang 
referenced  above,  but  they  called  their  formulation  M-stattonary. 
Actually,  this  simple  type  of  invariance  is  perhaps  too  restrictive  for 
many  types  of  nonstationaiy  signals  encountered  in  practice,  and  what  is 
required  in  order  to  relax  the  invariance  is  to  first  determine  the  square 
of  the  signal  rms  according  to  <x2(t)>  -  oHt)  and  then  re-write  x(t)  in 
normalized  form  as  o(t)t{t),  where  ^{t)  is  a  random  signal  of  unit  rms,  and 
pdf  TD{^{ti)X{t2)}-  Imposing  now  the  indicated  scale-change  invariance  on 
the  pdf  of  ^{t)  forces  this  signal,  consistent  with  x(t),  to  also  have  a  zero 
mean,  and  a  second-order  correlation,  call  it  R,  which  is  still  a  function 
of  a  only.  In  this  approach,  the  time -dependence  of  the  signal  rms, 
unlike  the  more  restrictive  invariance,  is  not  lost,  a  rather  key  ingredient 
to  nonstationaiy  analysis,  and  furthermore  the  need  to  introduce  a  time- 
scale  into  the  two-time  correlation  function  is  avoided.  Since  R(a)  is 
independent  of  scale,  it  is  fractal  in  structure. 

This  Invariance  can  be  relaxed  further  yet,  if  it  is  possible  to  invoke 
invariance  under  a  time -dependent  scale -change  such  that  the  related 
scale-change  for  second-order  moments  is  "approximately"  the  same  for 
both  tj  and  t2-  F'or  example,  suppose  that  w{^{t)}  =  where  A(t), 

as  Indicated,  is  a  time-dependent  scale-change.  Under  this  type  of 
invariance,  the  mean  of  ^{t)  is  still  a  constant  (=  0),  but  if  the  scale- 
change  is  such  that  Tn(^A(tl)ti],^A(t2)t2}}  ~  Tjj(^A(t)ti}M^t)t2}},  where  t  is 
some  "average"  value  between  tj  and  t2,  say  t  =  or  maybe  even  t  = 

(ti+t2)/2,  then  the  second-order  correlation,  again,  is  still  a  function  of 
t2/ti  only.  The  important  feature  for  this  invariance  to  manifest  is  that 
there  exist  a  time-dependent  scale  which  is  "common"  to  both  tj  and  t2- 
The  knowledge  that  certain  properties  of  a  particular  physical 
phenomenon  actually  remain  preserved  during  a  considered  evolution,  in 
any  sense,  can  be  of  great  help  not  only  in  simplifying  any  governing 
equations,  but  also  in  eventually  leading  to  their  solution.  At  the  very 
least  it  can  lead  to  new  and  different  insights. 

An  immediate  advantage  of  this  formulation  is  that  for  purposes  of 
second-order  moment  analyses,  all  nonstationaiy  behavior  in  xft)  is 
carried,  as  it  should  be,  by  the  time-dependent  rms  of  the  signal  and  not 
by  the  correlation  structure.  Recall  that  the  absolute  value  of  the 
correlation  between  Qti)  and  Qt2)  is  a  number  greater  than  or  equal  to 
zero  and  less  than  or  equal  to  unity  which  is  a  measure  of  that  fraction 
of  Qti)  that  "follows"  ^t2)  (or  vice  versa).  Historically,  correlations  have 
been  used  by  statistical  analysts  to  determine  a  number  of  statistics  of 
random  signals  which  are  different  and  distinct  from  the  classical  mode, 
median,  and  mean.  A  possible  disadvantage  of  this  formulation  is  that 
the  imposition  of  scale-invariance  has  come  at  the  expense  of 
translation  invariance.  Translation  invariance  (stationarity),  though, 
has  never  had  much  practical  significance,  since  most  random  behavior 
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encountered  in  nature  is  not  of  the  translation  invariant  variety  and 
moreso,  in  a  realistic  scenario,  the  location  of  the  origin  of  coordinates 
(time  in  this  case)  during  any  physical  measurement  of  random  behavior 
is  invariably  held  fixed,  and  therefore  "privileged,"  by  the  analyst.  Scale 
invariance  acknowledges  only  "larger  than"  and  "smaller  than,"  but  not 
where  one  ends  and  the  other  begins.  While  translation  invariance 
focuses  on  motion  through  time  and  has  no  preferred  origin,  scale- 
invariance  focuses  on  motion  through  scales  and  has  no  preferred  scale. 
Random  behavior  typically  encountered  in  nature  is  exactly  this,  since  at 
any  time  instant  it  can  be  safely  assumed  that  a  multitude  of  scales 
manifest  in  the  behavior,  with  none  of  these  scales  ever  ascending  to  a 
"preferred"  status. 


FIGURE  1 :  Scale-Invariant 
Correlation 


Simple  properties  of  the 
formulated  Invariance  are 
R(ct)  =  R(cri),  and  R(l)  =  1 
These  are  respectively 
analogous  to  R(t)  =  R(-t)  and 
R(0)  =  J  of  the  classical 
formulation.  These  two 
properties  suggest  that  R(a) 
in  the  scale -invariant 
formulation  has  a  geometric 
shape  similar  to  that 
depicted  in  Figure  1.  For 
third- order  correlations,  the 
corresponding  symmetry 
properties  are  S(ai,a2)  = 
S(a2,a})  =  S(ar^,a2/ai) 

where  ai  -  t2/ti,  a2  =  ts/ti 
and  Q(ti,t2,t3)  =  \^(t)  S(aj,a2). 
Here  t  =  (tit2t3)^^^  and  note  that  S(l,l)  has  no  a  priori  defined  value, 
since  third-order  correlations  are  zero  for  gaussian  signals  and  non-zero 
for  any  signal  which  has  a  pdf  non-symmetric  about  its  mean.  These 
rather  peculiar  symmetry  properties  impose  upon  S(ai,a2)  a  distinctly 
non -arbitrary  algebraic  form.  Specifically,  S(ai,a2)  must  be  expressible 
in  terms  of  the  forms,  aia2l(ai-^l+a2p,  aia2l(ai+aia2+a2p^'^,  etc.,  since 
these  forms  (and  others  like  these)  automatically  satisfy  the  required 
argument  S5mimetry. 


Ratio  of  Lengths 


6.  Generic  Example 

Of  special  interest  to  the  practitioner,  though,  is  "what  types  of 
nonstationary  behavior  encountered  in  'real  world'  scenarios  can  ideally 
be  characterized  in  this  scale-invariant  fashion?"  Consider  again,  then, 
the  random  process  x(t)  discussed  earlier  which  was  later  expressed  as  x(t) 
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«  o(t)Qt/L).  In  order  for  a  random  signal  to  be  reducible  to  this  form  it  is 
necessary  that  its  correlation  function(s)  adhere  to  a  somewhat 
specialized  algebraic  structure.  For  example,  normalization  of  the  two 
time  correlation  with  respect  to  the  square  of  the  rms  is  natural  and 
deducible  from  the  Schwarz  Inequality.  This  produces  in  the  "standard" 
notation  the  normalized  correlation  function  R(tj,t2),  which  in  said 
notation  re-writes  as  R(t,T)  where  t  and  rhave  been  previously  defined.  In 
applied  work  it  is  not  uncommon  to  let  the  t- dependence  of  R(t,T)  be 
carried  by  a  single  time  scale,  say  L(t),  or  possibly  even  a  t-  and  t- 
dependent  time  scale,  say  L(t,T),  such  that  I^t,T)  R{L(thr}  or  R(t,T)  => 
R{L(t,r),T}.  Whichever  mode  is  invoked,  though,  the  two  coordinates  of 
interest,  viz.  L  and  t,  can  be  converted  into  the  two  principle  coordinates, 
(j)  =  (Lxp-^  and  6  =  (t/LJ,  such  that  x  -  L~  0//(9  and  R{L,t}  =>  R(<l>,6). 
If  the  condition  of  scale  Invariance  is  now  Imposed  on  R{(l),  6),  it  follows 
that  it  cannot  be  a  function  of  (f),  since  any  (pure  scaling)  transformation 
which  carries  t  =>At  also  carries  L  =>  AL,  leaving  6  unaffected  while 
taking  ^  =>  A0.  Under  this  constraint,  the  originally  measured  random 
signal  can  now  be  written  as  x(t)  ^  o{t)Wh)  with  the  sign  required 
since  scale  invariance  has  been  artificially  Imposed  on  ^t),  and  is  ideally 
not  an  intrinsic  feature  of  any  normalized  random  signal.  Note  that  x(t) 
is  itself  neither  scEile  nor  translation  invariant  since  the  time-dependent 
rms  ofx(t)  is  scale  and  origin  dependent.  Translation  invariance,  though, 
can  be  imposed  in  an  "approximate"  sense  on  the  scale  invariant  ^6) 
such  that  the  correlation  between  Q6i)  and  U^2)  is  a  function  of  62^  - 

only,  where  62  and  This  is  possible  only  if 

=  ifa  scale  "common"  to  both  tj  and  t2  can  be 

found  such  that  the  difference  between  the  scale  invariant  coordinates 
scaled  with  different  scales  is  approximately  equal  to  the  difference  when 
the  coordinates  are  scaled  with  the  same  scale  {-  "scaled  statlonarity"). 
Typically  this  "common  scale"  is  defined  either  as  (Li+L2)l2  or  as 
(LiL2p-^-  In  this  way,  "global"  information,  through  scale  and 
(approximate)  translation  invariance,  and  "local"  knowledge,  through 
scale  and  origin  dependence  become  equal  partners  in  a  viable 
representation  of  random  behavior,  with  neither  attempting  to  deny  the 
other  by  unduly  extending  its  own  boundaries.  For  such  potential 
formulations,  the  correlation  between  ^61)  and  ^62)  can  also  be  written 
as  R(a)  =  R( 62/01)  as  well  as  R(62-0i) 

7.  Spectral  Formulation 

Any  formulation  of  the  theory  of  random  signals  would  be  incomplete 
without  a  spectral  version.  Spectral  analysis,  recall,  resolves  the  signal 
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into  components  which  make  additive  contributions  to  the  energy  in  the 
signal  and  have  clearly  recognizable  physical  meaning.  While  the 
demand  for  additive  contributions  to  the  energy  suggests  that  the  signal 
be  represented  in  terms  of  a  set  of  orthogonal  functions,  it  will  be 
demonstrated  momentarily  that  the  obvious  choice  in  this  formulation  is 
not  the  well-known  trigonometric  functions.  In  trigonometric  analysis, 
it  is  the  wavelength  (~  (frequency)- 1)  of  a  given  trigonometric  function 
which  explicitly  identifies  it  and  indeed  isolates  it  from  other  members  of 
the  set.  Ordinaiy  Fourier  methods  correspond,  therefore,  to  a  resolution 
of  the  signal  into  components  of  different  linear  size.  Additionally, 
trigonometric  functions  neither  Increase  nor  decrease  their  amplitudes  at 
infinity,  which  is  clearly  not  the  case  in  nonstationary  behavior.  In  the 
approach  formulated  herein,  it  is  not  "size"  but  rather  ratio  of  sizes 
which  defines  the  correlation  structure  of  the  signal. 

The  concept  of  a  spectrum  for  nonstationary  processes  is  not  a  new 
concept.  It  is  in  fact  discussed  in  several  advanced  texts  on  the  subject 
of  random  processes  (see,  for  example,  Bendat  and  Piersol  (1971)  and 
Papoulis  (1965)).  Recall  that  a  "spectrum,"  hi  its  most  classical  sense,  is 
a  fully  time-independent  function  that  describes  the  distribution  of 
signal  energy  over  frequency  space.  In  this  respect  it  is  equivalent  to  the 
corresponding  concept  for  the  stationary  case.  Unfortunately,  though,  it 
is  an  idea  whose  acceptance,  to  date,  is  not  widespread.  One  reason  for 
this  is  due  to  the  difficulty  in  finding  a  suitable  set  of  orthogonal 
eigenfunctions  which  can  be  utilized  to  formulate  a  spectral 
decomposition  for  the  general  case  of  nonstationary  behavior 
Nonstationarity,  by  its  very  nature,  implies  phase  consistency,  phase 
coherence,  etc.,  between  pairs  of  Fourier  frequencies,  thus,  as  stated 
earlier,  automatically  excluding  the  use  of  standard  complex 
exponentials  as  the  orthogonal  eigenfunction  set.  Another  ,  is  that  the 
essence  of  a  spectrum  for  nonstationary  phenomena  is  not  completely 
clear,  and  on  the  surface  appears  to  be  somewhat  paradoxical.  If  viewed 
properly,  though,  this  paradox  can  be  made  to  yield  useful  consequences, 
since  there  is  virtually  no  paradox  without  some  utility.  Research  on 
this  topic  has  been  done  and  indeed  reported  in  the  literature  (Page, 
1952;  Lampard,  1954;  Turner,  1954;  Levin,  1964;  Priestley,  1965  and 
1967;  Loynes,  1968;  Nagabhushanam  and  Bhagavan,  1968;  Bhagavan, 
1974).  Nonetheless,  the  concept  of  a  spectrum  for  nonstationary  process 
is  still  not  an  idea  whose  formulation  is  widely  accepted  nor  whose 
functional  structure  has  been  completely  investigated. 

The  concept  is  realized  by  formally  extending,  in  a  most  natural  sort  of 
way,  the  fundamental  relationship  between  the  correlation  and  spectrum 
functions  for  the  stationary  case  to  obtain,  for  the  nonstationary  case, 
the  expression. 


9(ti,t2)  ^(2^)  ^ fexp{~iio)fi  -a)2t2)}O(0)i,0)2)<i^l<i<^- 


(6) 
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Here,  0(0)1,01)2)  is  the  Fourier  frequency  spectrum  for  nonstationary 
processes,  and  is  historically  designated  a  generalized frequency  spectrum 
or  generalized  power  spectrum.  The  word  "formally"  is  emphasized  above 
since  the  time-dependent  nature  of  second-order  correlations  for  any 
nonstationary  process  requires  that  Equ.  (6)  cannot  be  an  equality  in 
general,  i.e.  complex  exponentials,  by  their  very  nature,  cannot  be  the 
natural  eigenfunctions  for  any  (save  the  most  highly  specialized)  time- 
dependent  dynamical  system.  Analysis  of  a  simple  spring-mass  system 
with  a  time-dependent  spring  (or  mass)  immediately  establishes  this 
result. 

In  order  to  establish,  though,  through  purely  mathematical 
arguments,  the  existence  of  a  spectrum  function  for  nonstatlonaiy 
signals  in  general  it  is  only  necessary  to  adapt  the  results  of  a  theorem 
peculiar  to  the  spectral  analysis  of  time  series.  The  essence  of  the 
theorem  is  that  any  random  process,  z(t),  even  a  nonstatlonaiy  one,  with 
a  continuous  correlation  function  has  an  orthogonal  expansion  with 
respect  to  some  appropriate  set  of  eigenfunctions  yjt),  i.e.  for  any  z(t),  it 
is  always  possible  to  write 


^^riYnd)  (7) 

n=l 

where  the  an  are  random  coefficients,  llm.  denotes  limit  in  the  mean,  and 
it  can  be  shown  that  Yn(t)  and  the  an  are  determined  by  solving  the 
integral  equation. 


JQ(t,s)Y(s)ds d^Y(t) ,  (8) 

where  6  is  a  constant.  The  condition  for  which  there  exists  an  infinity 
of  solutions  to  Equ.  (8)  is  a  premise  of  the  theory  of  integral  equations, 
and  will  not  be  addressed  here.  The  reader  who  prefers  more  discussion 
of  this  theoiy  is  referred  to  Chapter  III  of  Courant  and  Hilbert  (1954). 

The  spectral  version  of  the  theory  posed  here  can  be  effected  by 
introducing  the  variable  p  =  In  t;  or  alternatively  t  =  exp  p.  This  variable 
nonlinearly  transforms  the  normalized  random  signal  ^{t)  into  qofp) 
according  to  ^{t)  =  ^{exp  pj«^(pfp).  In  view  of  this  relation,  the  correlation 
R(a)  rewrites  as  R(exp  p2fexp  pjl<^Dfp2-pjj,  where  D(p2-pi),  because  it  is  a 
function  of  the  difference,  p2-pi,  is  the  correlation  of  the  stationary 
random  function  qfp).  That  is,  the  effected  transformation  from  t*^p, 
together  with  the  nonlinear  transformation  from  the  physical 
coordinates  (ti,t2)  to  the  principal  coordinates  t  =  (tit2)^^^  and  a  ~  t2/ti, 
has  converted  a  nonstationary  signal  whose  classical  normalized 
correlation  function  evolves  in  time  in  a  manner  which  can  be  modeled 
by  the  introduction  of  a  single  time -dependent  time  scale,  into  a  signal 
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whose  normalized  second-order  correlation  function  is  strictly 
stationary.  Therefore, 

=  fR[a)  a~^P~^da,  0  <  a<  oo  (9) 

where  is  a  wave  number  and  ^(p)  the  Mellin  transform  of  K(a). 
Baudelaire  (1974)  has  shown  that  the  Mellin  transform  plays  a  role  in 
the  analysis  of  scale- invariant  systems  similar  to  the  role  the  Fourier 
transform  plays  in  the  analysis  of  shift- invariant  systems,  viz.  it  is  a 
wave  number  spectrum  such  that  the  total  kinetic  energy  per  unit  mass 
is  found  by  summing  over  all  component  wave  numbers.  This  utility 
manifests  because  of  the  property  that  the  magnitude  of  the  Mellin 
transform  is  insensitive  to  scaling  of  the  independent  variable.  The 
Inverse  transform  follows  from 

^(P2  -P])  -=(2jrr^  f<l>(P)exp{LP(p2 

(10) 

Invoking  the  normalized  rms  relation,  D(0)  =R(1)  =  I ,  produces 

( 2jtr^ f(f>( P)dp  =  7,  establishing  that  does  indeed  retain  the  same 
meaning  in  the  scale- invariant  formulation  that  it  has  in  the  classical 
formulation.  The  Mellin  transform  has  already  found  many  viable 
applications  in  the  signal  processing  community  (see,  for  example,  Altes 
(1978),  Casasent  and  Psaltis  (1976  and  1977),  Casasent  and  Kraus 
(1978)). 

8.  Concluding  Remarks 

The  invariance  property  considered  herein  is  important  not  only 
because  is  characteristic  of  some  as  yet  undetermined  conservation 
principle,  but  also  because  it  unifies  processes  which  are  traditionally 
stationaiy  to  those  which  are  traditionally  nonstationary.  For  example, 
the  already  formulated  R(a)  unifies  certain  types  of  nonstationary 
random  processes  with  classical  stationary  processes,  since  it  is  identical 
for  both.  Note  that  the  ratio  of  t2  and  tj  is  the  sEune  whether  L  is  a 
function  of  time  or  is  in  fact  constant.  In  other  words  in  R(  a)  there  is  no 
distinction  between  a  stationary  random  process  and  a  self-similar  or 
self- preserving  nonstationary  random  process.  This  scale-independence 
reinforces  the  belief  that  since  length  scales  are  always  defined  by  the 
analyst  in  some  artificial  or  subjective  manner,  the  actual  physical 
phenomenon  under  consideration  should  nonetheless  exhibit  features 
which  are  natural  to  the  phenomenon  itself  and  independent  of  the  scale 
chosen  by  the  observer.  Such  behavior,  by  the  way,  is  contrary  to  the 
notion  of  twentieth  century  physics  that  a  particular  numerical  result 
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should  always  depend  upon  the  relation  of  observed  phenomena  to 
observer.  Invariance,  then,  is  perhaps  the  major  conceptual  tool  the  use 
of  which  the  scientist /engineer  exploits  the  unity  and  interrelation  of  all 
stochastic  behavior.  In  daily  professional  endeavors  the  analyst  is 
typically  unaware  of  this  unity,  and  rather  naturally  tends  to  divide 
ramdom  processes  into  separate  classes.  This  division  is  indeed  useful 
and  often  necessary,  but  it  is  nonetheless  an  artifact  of  the  "rational" 
type  of  knowledge,  and  not  an  intrinsic  feature  of  nature  (which  more 
closely  adheres  to  a  principle  of  plenitude).  Divisions,  comparisons, 
categorizations,  etc.,  create  a  world  of  distinctions,  and  any  information 
about  reality  gleaned  from  such  procedures  is  Inherently  limited.  In 
conceptualizing  about  nature  the  scientist  is  faced  with  the  same  kind  of 
problem  as  the  three  blind  men  who  touched  the  elephant;  only  an 
approximate  representation  of  reality  can  be  expected. 

More  important,  though,  is  the  eduction  of  clear  evidence  that  even 
though  the  statistical  structure  of  a  particular  nonstationaiy  random 
process  may  evolve  in  time  in  virtually  any  fashion,  there  exists 
nonetheless  some  well-defined  Invariance  to  the  entire  phenomena.  This 
Invariance  is  what  stimulates  the  quest  for  those  physical  laws  which 
bridge  the  gap  between  stationary  and  nonstationary  behavior.  Thus 
within  the  most  disorderly  of  phenomena,'  there  undoubtedly  resides  an 
underlying  order  co-existing  with  the  disorder.  In  a  universe  governed  by 
entropy,  which  stipulates  that  everything  tends  toward  greater  and  ever 
greater  disorder,  and  from  which  there  appears  to  be  no  means  of  appeal, 
how  does  such  order  arise,  why  does  it  persist,  and  what  are  the  as  yet 
undiscovered  laws  which  explain  it?  When  formulated,  such  laws  will 
answer  these  questions  and  simultaneously  provide  the  "missing  link" 
between  these  two  species  of  phenomena. 
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